captions
captions copied to clipboard
Incorrect newline characters breaking JSON parsing
See https://github.com/3b1b/captions/blob/main/2023/gaussian-integral/hebrew/sentence_translations.json#L774
I think this is the AI model trying to translate a \n
newline character, and using a Hebrew "n" instead, which is not a valid JSON escape character. So, parsing fails, and going to that lesson page shows that the captions file is missing (I could improve the message to discriminate between loading errors and parsing errors).
It'd be hard to make the app recover from this type of parsing error though. I could replace all \מ
s with \n
s, but what about other languages and escape characters? Perhaps a better solution here would be to make sure these characters are removed from the input English before passing them to the models. Could more easily make sure all escape characters are captured that way.