ko icon indicating copy to clipboard operation
ko copied to clipboard

Translate all code comments to English and convert to UTF8

Open stevewgr opened this issue 3 years ago • 2 comments

Description

The encoding topic has always been annoying from the beginning. We can fix this with an automated script and some regex magic.

Also thanks to AI and ML, Google Translate's translation has become quite advanced and accurate.

Tasks

The following steps should follow:

  • [ ] Write a script that retrieves comments from all source files using regex (pay attention to also block comments /**/)
  • [ ] Purchase (shouldn't be expensive) or get (if you can) an API key for Google Translate: https://cloud.google.com/translate/
  • [ ] Once you're able to translate things using a curl requests (RESTful), iterate through all the collected comments and replace each comment with its translated content
  • [ ] Finally convert all source files to UTF8 and check that nothing broke

stevewgr avatar Sep 04 '22 03:09 stevewgr

I'll pick this one.

stevewgr avatar Nov 06 '22 23:11 stevewgr

Blocking this task for now, as I unfortunately had not much progress in case someone else is going to dig through this. This is a bit complicated parsing C/C++ comments using LLVM and replacing them with translated strings, as there are many different edge cases, such as block and multi-line comments that are formatted under multiple lines. It's probably easier working under Linux environment implementing this.

I might look into that later again.

stevewgr avatar Jan 07 '23 16:01 stevewgr