gse
gse copied to clipboard
In Chinese word segmentation, only a single word is separated
Execute the following code (tabooSegmentCustomDicList there are more than 2000 words) ` for _, tabooSegmentCustomDic := range tabooSegmentCustomDicList { lowerCaseWord := strings.ToLower(tabooSegmentCustomDic.Word) segmentutil.AddWord(lowerCaseWord) }
func AddWord(word string) bool { defer recoverPanic(word) err := seg.AddToken(word, 100) if err != nil { logger.Errorf("Error when AddWord,%s", word, err) return false } return true }
func TextSegment(text string) []string { defer recoverPanic(text) return seg.Cut(text) }
`
TextSegment("api发送文本loumès 𝘾𝘼𝙍𝙏𝙄𝙀𝙍")
the result is ["api","发","送","文","本","lou","mès"," ","𝘾𝘼𝙍𝙏𝙄𝙀𝙍"]
Please set 'DefaultAnalyzer' to 'cjk. AnalyzerName' will resolve the issue.
how to set DefaultAnalyzer , search all repo files, no find this keyword/setting