yt-dlp
yt-dlp copied to clipboard
[ie/arte.tv] Extract accessible subtitles
IMPORTANT: PRs without the template will be CLOSED
Description of your pull request and other information
The arte.tv website can propose multiple subtitles for the same language. Some normal subtitles and some accessible subtitles for people suffering from hearing loss.
For example, on this video, we have two sets of french subtitles: subforced and accessible ones. When listing the subtitles, we got the following list:
$ python -m yt_dlp 'https://www.arte.tv/fr/videos/104913-001-A/sous-controle-1-6/' --list-subs
[info] Available subtitles for 104913-001-A:
Language Formats
fr vtt, vtt
This PR identifies those accessible subtitles and adds and -acc suffix to the language code.
$ python -m yt_dlp 'https://www.arte.tv/fr/videos/104913-001-A/sous-controle-1-6/' --list-subs
[info] Available subtitles for 104913-001-A:
Language Formats
fr vtt
fr-acc vtt
Implementation details:
I'm not familiar with this code base.
I tried to add a test in the ArteTVIE._TESTS
, but it seems that the test runner doesn't allow to access subtitles info (ignored here).
So I've added a new test case for this extractor. Since its the first test done this way, I'm not sure this is something we want for yt_dlp.
If you have any other idea to implement this test, I would be happy to implement it.
Template
Before submitting a pull request make sure you have:
- [x] At least skimmed through contributing guidelines including yt-dlp coding conventions
- [x] Searched the bugtracker for similar pull requests
- [x] Checked the code with flake8 and ran relevant tests
In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:
- [x] I am the original author of this code and I am willing to release it under Unlicense
- [ ] I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)
What is the purpose of your pull request?
- [x] Fix or improvement to an extractor (Make sure to add/update tests)
- [ ] New extractor (Piracy websites will not be accepted)
- [ ] Core bug fix/improvement
- [ ] New feature (It is strongly recommended to open an issue first)
Copilot Summary
🤖 Generated by Copilot at 47d2800
Summary
🎥🧪🔄
Add support for accessible subtitles in Arte videos and a test module for it. Modify yt_dlp/extractor/arte.py
to identify and convert the accessible subtitle formats and return them with modified language codes. Add test/extractor/test_arte.py
to test the new feature with pytest
.
Sing, O Muse, of the skillful coder who devised A way to extract the subtitles for the blind and deaf From the videos of Arte, the splendid channel of the arts And tested their work with pytest, the framework of the wise
Walkthrough
- Add a new feature to extract accessible subtitles from Arte videos (link, link, link)
- Modify the
ArteTVIE
extractor class inyt_dlp/extractor/arte.py
to check for subtitle formats with the suffix-MAL.m3u8
and append a-acc
suffix to the language code (link, link) - Call the
_contvert_accessible_subs_locale
method in the_real_extract
method to convert the subtitles before returning them (link) - Add a test module
test/extractor/test_arte.py
to test the new feature (link)- Define a test function
test_extract_accessible_subtitles
that uses thepytest
framework and theparametrize
decorator to test two examples of accessible subtitles (link) - Create an instance of the
ArteTVIE
extractor class and call its_contvert_accessible_subs_locale
method on theoriginal_subs
parameter (link) - Assert that the returned dictionary of subtitles has only one key, which is the
expected_locale
parameter, and that the value of that key is the same as the original subtitles for the French language (link)
- Define a test function
You should be able to use this as a unit test:
{
'url': 'https://www.arte.tv/fr/videos/104913-001-A/sous-controle-1-6/',
'info_dict': {
'id': '104913-001-A',
'ext': 'mp4',
'description': 'md5:ea65e21c4b9881b3ef1c333a914779da',
'thumbnail': 'https://api-cdn.arte.tv/img/v2/image/BL5WhDp2pnXcYhQJz9A8be/940x530',
'upload_date': '20230927',
'timestamp': 1695783600,
'duration': 1907,
'title': 'Sous contrôle (1/6)',
'subtitles': {
'fr': 'mincount:1',
'fr-acc': 'mincount:1',
},
},
}