riot
riot copied to clipboard
TokenLoc out of length
- Riot version (or commit ref): 20201013133145-f4c30acb3704
- Go version: go version go1.14.5 darwin/amd64
- Operating system and bit: macOS 10.15.6
- Can you reproduce the bug at Examples:
- [x] Yes (provide example code)
- [ ] No
- [ ] Not relevant
- Provide example code:
package main
import (
"log"
"github.com/go-ego/riot"
"github.com/go-ego/riot/types"
)
var (
searcher = riot.Engine{}
)
func init() {
initSearcher()
initIndex()
}
func initSearcher() {
searcher.Init(types.EngineOpts{
Using: 3,
GseDict: "zh",
IndexerOpts: &types.IndexerOpts{
IndexType: types.LocsIndex,
},
})
}
func initIndex() {
docID := "1"
content := "验证账户权限 运行一些简单的指令来验证账户的有效性 > show dbs admin 0.000GB config 0.000GB local 0.000GB > show users { \"_id\" : \"admin.admin\", \"userId\" : UUID(\"dc5760ea-c8c1-4f40-af5b-7d9d53779842\"), \"user\" : \"admin\", \"db\" : \"admin\", \"roles\" : [ { \"role\" : \"userAdminAnyDatabase\", \"db\" : \"admin\" } ], \"mechanisms\" : [ \"SCRAM-SHA-1\", \"SCRAM-SHA-256\" ] } "
searcher.Index(docID,
types.DocData{Content: content},
)
searcher.Flush()
}
func main() {
keyword := "t"
res := searcher.SearchDoc(types.SearchReq{Text: keyword})
log.Println("TokenLocs = ", res.Docs[0].TokenLocs)
log.Println("len(content) = ", len(res.Docs[0].Content))
}
- Log gist: 2020/12/20 13:39:33 Load the gse dictionary: "/Users/jabin/go/pkg/mod/github.com/go-ego/[email protected]/data/dict/dictionary.txt" 2020/12/20 13:39:34 Gse dictionary loaded finished. 2020/12/20 13:39:34 Check virtualMemory... 2020/12/20 13:39:34 Total: 17179869184, Free: 15147008, UsedPercent: 64.184594% 2020/12/20 13:39:34 TokenLocs = [[495]] 2020/12/20 13:39:34 len(content) = 376
Description
First TokenLoc is 495 but greater than len(content).
Maybe, because of different between chinese character and english letter.