subclean [Feature Request] Support for other character encodings

[Feature Request] Support for other character encodings

Open DrKain opened this issue 3 years ago • 1 comments

Right now the tool will fail when trying to parse files with this character encoding. For a viable solution the tool should be able to detect the character encoding and convert to UTF-8 when required.
The converted data should be written even if nodes were not modified, this will remove the need to convert a file multiple times when running subclean on an entire library as a scheduled task.

See this https://github.com/DrKain/subclean/issues/7#issuecomment-948572760 for information on a temporary solution for the current problem.

Unfortunately this will require a dependency like utf8.

Test files:

UCS-2 BE BOM: subtitle.zip
UTF-8-BOM: subtitle.zip

Oct 21 '21 12:10 DrKain

If you're using Bazarr, you can avoid this issue with the setting:

Settings → Subtitles → Post-Processing → Encode Subtitles To UTF8

Apr 28 '22 17:04 DrKain

subclean subclean copied to clipboard

[Feature Request] Support for other character encodings

subclean
subclean copied to clipboard