Subtitle Extraction Issue – HTML Entity Appears in Subtitles
Checklist
- [ ] I'm reporting a bug unrelated to a specific site.
- [x] I've verified that I'm running the latest version of yt-dlp.
- [x] I've verified that I'm running the latest stable version of Seal or any later preview versions.
- [x] I've read the Contributing guidelines and Code Of Conduct.
- [x] I've checked that the site i'm trying to download from is in the Supported Sites list from yt-dlp
- [x] I understand that the issue will be (ignored/closed) if I intentionally remove or skip any mandatory field.
Describe the bug
When extracting subtitles, I noticed that HTML entities such as appear directly in the subtitle text instead of being properly decoded into regular spaces. This negatively affects readability and the overall user experience.
Example:
In the generated subtitles, I got:
subhanallah, es bonito ver como la huella morisca habita de un modo u otro en nuestros corazones
Expected output:
subhanallah, es bonito ver como la huella morisca habita de un modo u otro en nuestros corazones
To Reproduce
Steps to Reproduce:
-
Use SEAL to extract subtitles from a video with embedded subtitles.
-
Open the generated subtitle file or view the on-screen text.
-
Observe the presence of and potentially other HTML entities.
Error reports
No crash or error message occurred during extraction. The process completes successfully, but the output contains incorrect formatting due to unconverted HTML entities
Screenshots & Screen Records
Additional context
This issue seems to occur when the source subtitles contain HTML-encoded characters. A decoding step after subtitle extraction might help fix this. I’m using the latest version of SEAL as of [insert date/version if known].
Thank you for your efforts on this project.
Checklist
* [ ] I'm reporting a bug unrelated to a specific site.[x] I've verified that I'm running the [**latest version**](https://github.com/yt-dlp/yt-dlp/releases/latest) of yt-dlp.[x] I've verified that I'm running the latest [**stable version**](https://github.com/JunkFood02/Seal/releases/latest/) of Seal or any later [**preview versions**](https://github.com/JunkFood02/Seal/releases).[x] I've read the [**Contributing guidelines**](https://github.com/JunkFood02/Seal/blob/main/CONTRIBUTING.md) and [**Code Of Conduct.**](https://github.com/JunkFood02/Seal/blob/main/CODE_OF_CONDUCT.md)[x] I've checked that the site i'm trying to download from is in the [**Supported Sites**](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md) list from yt-dlp[x] I understand that the issue will be (ignored/closed) if I intentionally remove or skip any mandatory field.Describe the bug
When extracting subtitles, I noticed that HTML entities such as appear directly in the subtitle text instead of being properly decoded into regular spaces. This negatively affects readability and the overall user experience.
Example:
In the generated subtitles, I got:
subhanallah, es bonito ver como la huella morisca habita de un modo u otro en nuestros corazones
Expected output:
subhanallah, es bonito ver como la huella morisca habita de un modo u otro en nuestros corazones
To Reproduce
Steps to Reproduce:
1. Use SEAL to extract subtitles from a video with embedded subtitles. 2. Open the generated subtitle file or view the on-screen text. 3. Observe the presence of and potentially other HTML entities.Error reports
No crash or error message occurred during extraction. The process completes successfully, but the output contains incorrect formatting due to unconverted HTML entities
Screenshots & Screen Records
Additional context
This issue seems to occur when the source subtitles contain HTML-encoded characters. A decoding step after subtitle extraction might help fix this. I’m using the latest version of SEAL as of [insert date/version if known].
Thank you for your efforts on this project.
yt-dlp issue not the app
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.