Seal icon indicating copy to clipboard operation
Seal copied to clipboard

Subtitle Extraction Issue – HTML Entity   Appears in Subtitles

Open Hzifa33 opened this issue 8 months ago • 1 comments

Checklist

  • [ ] I'm reporting a bug unrelated to a specific site.
  • [x] I've verified that I'm running the latest version of yt-dlp.
  • [x] I've verified that I'm running the latest stable version of Seal or any later preview versions.
  • [x] I've read the Contributing guidelines and Code Of Conduct.
  • [x] I've checked that the site i'm trying to download from is in the Supported Sites list from yt-dlp
  • [x] I understand that the issue will be (ignored/closed) if I intentionally remove or skip any mandatory field.

Describe the bug

When extracting subtitles, I noticed that HTML entities such as   appear directly in the subtitle text instead of being properly decoded into regular spaces. This negatively affects readability and the overall user experience.

Example:

In the generated subtitles, I got:

subhanallah, es bonito ver como la huella morisca habita de un modo u otro en nuestros corazones

Expected output:

subhanallah, es bonito ver como la huella morisca habita de un modo u otro en nuestros corazones

To Reproduce

Steps to Reproduce:

  1. Use SEAL to extract subtitles from a video with embedded subtitles.

  2. Open the generated subtitle file or view the on-screen text.

  3. Observe the presence of   and potentially other HTML entities.

Error reports

No crash or error message occurred during extraction. The process completes successfully, but the output contains incorrect formatting due to unconverted HTML entities

Screenshots & Screen Records

Image

Additional context

This issue seems to occur when the source subtitles contain HTML-encoded characters. A decoding step after subtitle extraction might help fix this. I’m using the latest version of SEAL as of [insert date/version if known].

Thank you for your efforts on this project.

Hzifa33 avatar Apr 14 '25 09:04 Hzifa33

Checklist

* [ ]  I'm reporting a bug unrelated to a specific site.[x]  I've verified that I'm running the [**latest version**](https://github.com/yt-dlp/yt-dlp/releases/latest) of yt-dlp.[x]  I've verified that I'm running the latest [**stable version**](https://github.com/JunkFood02/Seal/releases/latest/) of Seal or any later [**preview versions**](https://github.com/JunkFood02/Seal/releases).[x]  I've read the [**Contributing guidelines**](https://github.com/JunkFood02/Seal/blob/main/CONTRIBUTING.md) and [**Code Of Conduct.**](https://github.com/JunkFood02/Seal/blob/main/CODE_OF_CONDUCT.md)[x]  I've checked that the site i'm trying to download from is in the [**Supported Sites**](https://github.com/yt-dlp/yt-dlp/blob/master/supportedsites.md) list from yt-dlp[x]  I understand that the issue will be (ignored/closed) if I intentionally remove or skip any mandatory field.

Describe the bug

When extracting subtitles, I noticed that HTML entities such as appear directly in the subtitle text instead of being properly decoded into regular spaces. This negatively affects readability and the overall user experience.

Example:

In the generated subtitles, I got:

subhanallah, es bonito ver como la huella morisca habita de un modo u otro en nuestros corazones

Expected output:

subhanallah, es bonito ver como la huella morisca habita de un modo u otro en nuestros corazones

To Reproduce

Steps to Reproduce:

1. Use SEAL to extract subtitles from a video with embedded subtitles.

2. Open the generated subtitle file or view the on-screen text.

3. Observe the presence of   and potentially other HTML entities.

Error reports

No crash or error message occurred during extraction. The process completes successfully, but the output contains incorrect formatting due to unconverted HTML entities

Screenshots & Screen Records

Image

Additional context

This issue seems to occur when the source subtitles contain HTML-encoded characters. A decoding step after subtitle extraction might help fix this. I’m using the latest version of SEAL as of [insert date/version if known].

Thank you for your efforts on this project.

yt-dlp issue not the app

error-reporting avatar May 02 '25 05:05 error-reporting

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 30 days.

github-actions[bot] avatar Aug 01 '25 00:08 github-actions[bot]