private-gpt when I run main of privateGPT.py I get this error: gpt_tokenize: unknown token 'Γ' gpt_tokenize: unknown token 'Ç' gpt_tokenize: unknown token 'Ö' gpt_tokenize: unknown token 'Γ' gpt_tokenize: unknown token 'Ç' gpt_tokenize: unknown token 'Ö' gpt_tokenize: unknown token 'Γ' gpt_tokenize: unknown token 'Ç' gpt

May 12 '23 14:05 Amarbo

I see those with some of my training files, too - I just ignore them for now and the model still seems to answer inquiries.

May 12 '23 16:05 mmike87

I follow exactly your instructions, so what can I do? Should I install another model? I yes which one please? I have a PC windows 11 with 16 GB RAM memory.

May 12 '23 16:05 Amarbo

That's just a warning. There are some unsupported characters in your source files. But that won't prevent the model from working. You'll see the answer right after those warnings.

May 13 '23 08:05 imartinez

In the sample text, it seems that everything is encoded properly as UTF-8 text. However, there are "fancy quotes" at several places in the document, and somewhere along the toolchain this isn't being parsed properly.

If I open the sample text in an editor such as Geany, I cannot convert it to ISO-8859-1 without errors; however, it is possible to convert it to Windows 1252 and back to Unicode (i.e. UTF-8).

So which element in the toolchain doesn't understand UTF-8?

May 13 '23 20:05 bobhairgrove

BTW, after getting several of these errors, the script is killed on my system, as others have also reported.

May 13 '23 20:05 bobhairgrove

This needs to be reopened, IMHO -- how are we going to do other languages besides English if the program cannot handle Unicode?

May 13 '23 20:05 bobhairgrove

i get the same error with the example text

May 14 '23 13:05 hodanli

Today, the only language supported by privateGPT is English?

May 15 '23 11:05 thvi