Jaume Zaragoza

Results 124 comments of Jaume Zaragoza

The password doesn't seem to be the problem neither the keyboard. I have an Spanish keyboard, but the main password does not have any non-ASCII character. Furthermore, I added vconsole...

> I believe you need to use `cryptsetup convert` tool for format conversion. I didn't remember that command, I suppose that I used that. > Could you please provide the...

Or maybe take this into account for #246

Had similar issues with special unicode symbols. May not be suitable for every scenario but for that I used `--tok spm`, as SentencePiece already does NFKC normalization by default. The...

Hi @radinplaid, I agree and I've been thinking of it since I did the tool. Unfortunately Tensorflow does not support it natively, so it would require us to replace the...

Is this planned to be merged at any point? I'm interested.

This also seems to be removing Chinese sentences if the numbers do not have a space or other punctuation separating them from the Chinese characters. ![imatge](https://github.com/user-attachments/assets/c8b7f50a-459f-42c8-bd53-d6275eb9eed1) ![imatge](https://github.com/user-attachments/assets/a46920b2-547a-4a3a-8764-85ee06727734)

Should we create something like blocklists that we simply add to the repository and then the UI shows a warning if a corpus is in that blocklist?

I also leave here this writing system detector which can be useful in the future: https://pypi.org/project/hanzidentifier/ and this simple script to convert all the Chinese characters to Pinyin ```python from...