`'icu.BreakIterator' object has no attribute 'getRuleStatus'` when using locale like `en@ss=standard`
Created by: qtdaniel
I would like to use the sentence break filters, i.e. lists of common abbreviations, that adjust the behaviour of the sentence tokeniser as documented here: http://userguide.icu-project.org/boundaryanalysis#TOC-Sentence-Break-Filter
I was expecting to be able to achieve this by simply changing my locale string from "en" to "en@ss=standard" but this causes the error 'icu.BreakIterator' object has no attribute 'getRuleStatus'
when getRuleStatus
is called on the resulting break iterator.
I've tried this via icu.BreakIterator.createSentenceInstance
and via icu.RuleBasedBreakIterator.createSentenceInstance
but both exhibit the same error.
Here's is a minimal reproduction:
import icu
bi = icu.BreakIterator.createSentenceInstance(icu.Locale("en"))
print(bi.getRuleStatus())
bi = icu.RuleBasedBreakIterator.createSentenceInstance(icu.Locale("en"))
print(bi.getRuleStatus())
bi = icu.BreakIterator.createSentenceInstance(icu.Locale("en@ss=standard"))
try:
print(bi.getRuleStatus())
except Exception as exception:
print(
"Failed to get rule status when using en@ss=standard locale with BreakIterator",
exception
)
bi = icu.RuleBasedBreakIterator.createSentenceInstance(icu.Locale("en@ss=standard"))
try:
print(bi.getRuleStatus())
except Exception as exception:
print(
"Failed to get rule status when using en@ss=standard locale with"
" RuleBasedBreakIterator",
exception
)
When run, that code emits
0
0
Failed to get rule status when using en@ss=standard locale with BreakIterator 'icu.BreakIterator' object has no attribute 'getRuleStatus'
Failed to get rule status when using en@ss=standard locale with RuleBasedBreakIterator 'icu.BreakIterator' object has no attribute 'getRuleStatus'
I am using icu and pyicu via conda-forge:
icu 64.2 he1b5a44_1 conda-forge
pyicu 2.4.2 py37h8412b87_0 conda-forge
python 3.7.6 cpython_h8356626_6 conda-forge