Translate all code comments to English and convert to UTF8
Description
The encoding topic has always been annoying from the beginning. We can fix this with an automated script and some regex magic.
Also thanks to AI and ML, Google Translate's translation has become quite advanced and accurate.
Tasks
The following steps should follow:
- [ ] Write a script that retrieves comments from all source files using regex (pay attention to also block comments /**/)
- [ ] Purchase (shouldn't be expensive) or get (if you can) an API key for Google Translate: https://cloud.google.com/translate/
- [ ] Once you're able to translate things using a curl requests (RESTful), iterate through all the collected comments and replace each comment with its translated content
- [ ] Finally convert all source files to UTF8 and check that nothing broke
I'll pick this one.
Blocking this task for now, as I unfortunately had not much progress in case someone else is going to dig through this. This is a bit complicated parsing C/C++ comments using LLVM and replacing them with translated strings, as there are many different edge cases, such as block and multi-line comments that are formatted under multiple lines. It's probably easier working under Linux environment implementing this.
I might look into that later again.