Package org.apache.lucene.analysis.br
Class BrazilianStemmer
java.lang.Object
org.apache.lucene.analysis.br.BrazilianStemmer
A stemmer for Brazilian Portuguese words.
-
Field Summary
Fields -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprivate String
changeTerm
(String value) 1) Turn to lowercase 2) Remove accents 3) ã -> a ; õ -> o 4) ç -> cprivate void
Creates CT (changed term) , substituting * 'ã' and 'õ' for 'a~' and 'o~'.private String
Gets R1private String
Gets RVprivate boolean
isIndexable
(String term) Checks a term if it can be processed indexed.private boolean
isStemmable
(String term) Checks a term if it can be processed correctly.private boolean
isVowel
(char value) See if string is 'a','e','i','o','u'(package private) String
log()
For log and debug purposeprivate String
removeSuffix
(String value, String toRemove) Remove a string suffixprivate String
replaceSuffix
(String value, String toReplace, String changeTo) Replace a string suffix by anotherprotected String
Stems the given term to an uniquediscriminator
.private boolean
step1()
Standard suffix removal.private boolean
step2()
Verb suffixes.private void
step3()
Delete suffix 'i' if in RV and preceded by 'c'private void
step4()
Residual suffixprivate void
step5()
If the word ends with one of ( e é ê) in RV,delete it, and if preceded by 'gu' (or 'ci') with the 'u' (or 'i') in RV, delete the 'u' (or 'i')private boolean
Check if a string ends with a suffixprivate boolean
suffixPreceded
(String value, String suffix, String preceded) See if a suffix is preceded by a String
-
Field Details
-
locale
-
TERM
Changed term -
CT
-
R1
-
R2
-
RV
-
-
Constructor Details
-
BrazilianStemmer
public BrazilianStemmer()
-
-
Method Details
-
stem
Stems the given term to an uniquediscriminator
.- Parameters:
term
- The term that should be stemmed.- Returns:
- Discriminator for
term
-
isStemmable
Checks a term if it can be processed correctly.- Returns:
- true if, and only if, the given term consists in letters.
-
isIndexable
Checks a term if it can be processed indexed.- Returns:
- true if it can be indexed
-
isVowel
private boolean isVowel(char value) See if string is 'a','e','i','o','u'- Returns:
- true if is vowel
-
getR1
Gets R1R1 - is the region after the first non-vowel following a vowel, or is the null region at the end of the word if there is no such non-vowel.
- Returns:
- null or a string representing R1
-
getRV
Gets RVRV - IF the second letter is a consonant, RV is the region after the next following vowel,
OR if the first two letters are vowels, RV is the region after the next consonant,
AND otherwise (consonant-vowel case) RV is the region after the third letter.
BUT RV is the end of the word if this positions cannot be found.
- Returns:
- null or a string representing RV
-
changeTerm
1) Turn to lowercase 2) Remove accents 3) ã -> a ; õ -> o 4) ç -> c- Returns:
- null or a string transformed
-
suffix
Check if a string ends with a suffix- Returns:
- true if the string ends with the specified suffix
-
replaceSuffix
Replace a string suffix by another- Returns:
- the replaced String
-
removeSuffix
Remove a string suffix- Returns:
- the String without the suffix
-
suffixPreceded
See if a suffix is preceded by a String- Returns:
- true if the suffix is preceded
-
createCT
Creates CT (changed term) , substituting * 'ã' and 'õ' for 'a~' and 'o~'. -
step1
private boolean step1()Standard suffix removal. Search for the longest among the following suffixes, and perform the following actions:- Returns:
- false if no ending was removed
-
step2
private boolean step2()Verb suffixes.Search for the longest among the following suffixes in RV, and if found, delete.
- Returns:
- false if no ending was removed
-
step3
private void step3()Delete suffix 'i' if in RV and preceded by 'c' -
step4
private void step4()Residual suffixIf the word ends with one of the suffixes (os a i o á í ó) in RV, delete it
-
step5
private void step5()If the word ends with one of ( e é ê) in RV,delete it, and if preceded by 'gu' (or 'ci') with the 'u' (or 'i') in RV, delete the 'u' (or 'i')Or if the word ends ç remove the cedilha
-
log
String log()For log and debug purpose- Returns:
- TERM, CT, RV, R1 and R2
-