when I run main of privateGPT.py I get this error: gpt_tokenize: unknown token 'Γ' gpt_tokenize: unknown token 'Ç' gpt_tokenize: unknown token 'Ö' gpt_tokenize: unknown token 'Γ' gpt_tokenize: unknown token 'Ç' gpt_tokenize: unknown token 'Ö' gpt_tokenize: unknown token 'Γ' gpt_tokenize: unknown token 'Ç' gpt_tokenize: unknown token 'Ö' ...
I see those with some of my training files, too - I just ignore them for now and the model still seems to answer inquiries.
I follow exactly your instructions, so what can I do? Should I install another model? I yes which one please? I have a PC windows 11 with 16 GB RAM memory.
That's just a warning. There are some unsupported characters in your source files. But that won't prevent the model from working. You'll see the answer right after those warnings.
In the sample text, it seems that everything is encoded properly as UTF-8 text. However, there are "fancy quotes" at several places in the document, and somewhere along the toolchain this isn't being parsed properly.
If I open the sample text in an editor such as Geany, I cannot convert it to ISO-8859-1 without errors; however, it is possible to convert it to Windows 1252 and back to Unicode (i.e. UTF-8).
So which element in the toolchain doesn't understand UTF-8?
BTW, after getting several of these errors, the script is killed on my system, as others have also reported.
This needs to be reopened, IMHO -- how are we going to do other languages besides English if the program cannot handle Unicode?
i get the same error with the example text
Today, the only language supported by privateGPT is English?