bazarr
bazarr copied to clipboard
Bazarr embedded sub extractor removing words
Describe the bug
Hello, it seems that Bazarr's embedded subs extractor is removing some words.
We can see that it extracted the embedded English subs:
But it removed the word "SPAN" from the sub file:
I re-downloaded the original file outside of Bazarr and the word was not removed:
Here are my Post-Processing settings, I don't know if they are also used during embedded subs extraction:
To Reproduce Steps to reproduce the behavior:
- Configure Bazarr to extract embedded subs
- Download a video that has embedded subs with the word 'SPAN' in them
- Let Bazarr extract the embedded subs
Software (please complete the following information):
- Bazarr: 1.1.0
- Radarr version: 4.1.0.6175
- Sonarr version: 3.0.8.1507
- OS: Linux-3.10.0-1160.11.1.el7.x86_64-x86_64-with
Embedded Provider won't touch any subtitles. This issue is related to post-processing logic.
Ok so the embedded provider just extracts the subs as-is, and then the post-processing logic runs:
- encode to UTF-8
- remove hearing impaired
- OCR fixes
- common fixes
- fix uppercase
so one of those is removing the 'SPAN' word?
Yes, I assume the hearing impaired processor is making the change.
Will you be able to fix it?
I'll try.
Please send the untouched subtitle file if you can.
Here you go: https://demo.lufi.io/r/AS3jQ9Mc6z#V2fXl71NM5mEZTQYK/AJTRsBYIFIVHC/VBy5pcJaIrQ=
Should be fixed in upcoming beta. Thanks!