tokenizer icon indicating copy to clipboard operation
tokenizer copied to clipboard

regexp dont support ?i

Open xuxiaoxia96 opened this issue 1 year ago • 1 comments

panic: regexp: Compile("(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1}| ?[^\s\p{L}\p{N}\r\n]+|\s*[\r\n]+|\s+(?!\S)|\s+"): error parsing regexp: invalid or unsupported Perl syntax: (?! [recovered] panic: regexp: Compile("(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1}| ?[^\s\p{L}\p{N}\r\n]+|\s*[\r\n]+|\s+(?!\S)|\s+"): error parsing regexp: invalid or unsupported Perl syntax: (?!

xuxiaoxia96 avatar Aug 06 '24 12:08 xuxiaoxia96

Is anyone watching this?I got the same problem

cckate avatar Aug 18 '24 10:08 cckate

@shibingli I have merged your repo and fixed some 'import/go.mod' error. Now it works. https://github.com/whitezhang/tokenizer

whitezhang avatar Dec 04 '24 08:12 whitezhang

not able to count tokens for gpt-4 and gpt-3.5turbo getting this same error,

--- FAIL: TestGetTokenCountSugarMe (0.10s)
panic: regexp: Compile(`(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s+(?!\S)|\s+`): error parsing regexp: invalid or unsupported Perl syntax: `(?!` [recovered]
	panic: regexp: Compile(`(?i:'s|'t|'re|'ve|'m|'ll|'d)|[^\r\n\p{L}\p{N}]?\p{L}+|\p{N}{1,3}| ?[^\s\p{L}\p{N}]+[\r\n]*|\s*[\r\n]+|\s+(?!\S)|\s+`): error parsing regexp: invalid or unsupported Perl syntax: `(?!`

model used: "Xenova/gpt-3.5-turbo", "Xenova/gpt-4",

shishpalvishnoi avatar Dec 11 '24 10:12 shishpalvishnoi

If anyone can help? No bandwidth atm. Thanks

sugarme avatar Dec 13 '24 02:12 sugarme

Hi, @sugarme , I have drafted PR #60 to fix this. It would be awesome if you give it a look.

Thanks for the fascinating project!


For package users, if you want to test the fix locally:

go mod edit -replace=github.com/sugarme/tokenizer=github.com/nanmu42/go-tokenizer@master
go mod tidy

Cheers.

nanmu42 avatar Jul 12 '25 07:07 nanmu42

Hi @sugarme, Thanks for your lib. I've encountered the same issue. Could you please let me know when it will be fixed? @nanmu42 this repo does not exists (github.com/nanmu42/go-tokenizer@master), anyone could solve this problem?

duke-git avatar Dec 01 '25 08:12 duke-git

this repo does not exists (github.com/nanmu42/go-tokenizer@master), anyone could solve this problem?

It's still at https://github.com/nanmu42/go-tokenizer , no changes were made.

Try:

go mod edit -replace=github.com/sugarme/tokenizer=github.com/nanmu42/go-tokenizer@master
go mod tidy

nanmu42 avatar Dec 01 '25 08:12 nanmu42