captions Incorrect newline characters breaking JSON parsing

Incorrect newline characters breaking JSON parsing

Open vincerubinetti opened this issue 11 months ago • 0 comments

See https://github.com/3b1b/captions/blob/main/2023/gaussian-integral/hebrew/sentence_translations.json#L774

I think this is the AI model trying to translate a \n newline character, and using a Hebrew "n" instead, which is not a valid JSON escape character. So, parsing fails, and going to that lesson page shows that the captions file is missing (I could improve the message to discriminate between loading errors and parsing errors).

It'd be hard to make the app recover from this type of parsing error though. I could replace all \מs with \ns, but what about other languages and escape characters? Perhaps a better solution here would be to make sure these characters are removed from the input English before passing them to the models. Could more easily make sure all escape characters are captured that way.

Mar 11 '24 18:03 vincerubinetti

captions captions copied to clipboard

Incorrect newline characters breaking JSON parsing

captions
captions copied to clipboard