riot icon indicating copy to clipboard operation
riot copied to clipboard

TokenLoc out of length

Open JabinGP opened this issue 3 years ago • 1 comments

  • Riot version (or commit ref): 20201013133145-f4c30acb3704
  • Go version: go version go1.14.5 darwin/amd64
  • Operating system and bit: macOS 10.15.6
  • Can you reproduce the bug at Examples:
    • [x] Yes (provide example code)
    • [ ] No
    • [ ] Not relevant
  • Provide example code:
package main

import (
	"log"

	"github.com/go-ego/riot"
	"github.com/go-ego/riot/types"
)

var (
	searcher = riot.Engine{}
)

func init() {
	initSearcher()
	initIndex()
}

func initSearcher() {
	searcher.Init(types.EngineOpts{
		Using:   3,
		GseDict: "zh",
		IndexerOpts: &types.IndexerOpts{
			IndexType: types.LocsIndex,
		},
	})
}

func initIndex() {
	docID := "1"
	content := "验证账户权限 运行一些简单的指令来验证账户的有效性 > show dbs admin 0.000GB config 0.000GB local 0.000GB > show users { \"_id\" : \"admin.admin\", \"userId\" : UUID(\"dc5760ea-c8c1-4f40-af5b-7d9d53779842\"), \"user\" : \"admin\", \"db\" : \"admin\", \"roles\" : [ { \"role\" : \"userAdminAnyDatabase\", \"db\" : \"admin\" } ], \"mechanisms\" : [ \"SCRAM-SHA-1\", \"SCRAM-SHA-256\" ] } "
	searcher.Index(docID,
		types.DocData{Content: content},
	)
	searcher.Flush()
}

func main() {
	keyword := "t"

	res := searcher.SearchDoc(types.SearchReq{Text: keyword})

	log.Println("TokenLocs = ", res.Docs[0].TokenLocs)
	log.Println("len(content) = ", len(res.Docs[0].Content))
}
  • Log gist: 2020/12/20 13:39:33 Load the gse dictionary: "/Users/jabin/go/pkg/mod/github.com/go-ego/[email protected]/data/dict/dictionary.txt" 2020/12/20 13:39:34 Gse dictionary loaded finished. 2020/12/20 13:39:34 Check virtualMemory... 2020/12/20 13:39:34 Total: 17179869184, Free: 15147008, UsedPercent: 64.184594% 2020/12/20 13:39:34 TokenLocs = [[495]] 2020/12/20 13:39:34 len(content) = 376

Description

First TokenLoc is 495 but greater than len(content).

JabinGP avatar Dec 20 '20 05:12 JabinGP

Maybe, because of different between chinese character and english letter.

stuchilde avatar Jun 26 '21 09:06 stuchilde