private-gpt icon indicating copy to clipboard operation
private-gpt copied to clipboard

when I run main of privateGPT.py I get this error: gpt_tokenize: unknown token 'Γ' gpt_tokenize: unknown token 'Ç' gpt_tokenize: unknown token 'Ö' gpt_tokenize: unknown token 'Γ' gpt_tokenize: unknown token 'Ç' gpt_tokenize: unknown token 'Ö' gpt_tokenize: unknown token 'Γ' gpt_tokenize: unknown token 'Ç' gpt_tokenize: unknown token 'Ö' ...

Open Amarbo opened this issue 2 years ago • 2 comments

Amarbo avatar May 12 '23 14:05 Amarbo

I see those with some of my training files, too - I just ignore them for now and the model still seems to answer inquiries.

mmike87 avatar May 12 '23 16:05 mmike87

I follow exactly your instructions, so what can I do? Should I install another model? I yes which one please? I have a PC windows 11 with 16 GB RAM memory.

Amarbo avatar May 12 '23 16:05 Amarbo

That's just a warning. There are some unsupported characters in your source files. But that won't prevent the model from working. You'll see the answer right after those warnings.

imartinez avatar May 13 '23 08:05 imartinez

In the sample text, it seems that everything is encoded properly as UTF-8 text. However, there are "fancy quotes" at several places in the document, and somewhere along the toolchain this isn't being parsed properly.

If I open the sample text in an editor such as Geany, I cannot convert it to ISO-8859-1 without errors; however, it is possible to convert it to Windows 1252 and back to Unicode (i.e. UTF-8).

So which element in the toolchain doesn't understand UTF-8?

bobhairgrove avatar May 13 '23 20:05 bobhairgrove

BTW, after getting several of these errors, the script is killed on my system, as others have also reported.

bobhairgrove avatar May 13 '23 20:05 bobhairgrove

This needs to be reopened, IMHO -- how are we going to do other languages besides English if the program cannot handle Unicode?

bobhairgrove avatar May 13 '23 20:05 bobhairgrove

i get the same error with the example text

hodanli avatar May 14 '23 13:05 hodanli

Today, the only language supported by privateGPT is English?

thvi avatar May 15 '23 11:05 thvi