yt-dlp icon indicating copy to clipboard operation
yt-dlp copied to clipboard

[ie/arte.tv] Extract accessible subtitles

Open Nicals opened this issue 1 year ago • 1 comments

IMPORTANT: PRs without the template will be CLOSED

Description of your pull request and other information

The arte.tv website can propose multiple subtitles for the same language. Some normal subtitles and some accessible subtitles for people suffering from hearing loss.

For example, on this video, we have two sets of french subtitles: subforced and accessible ones. When listing the subtitles, we got the following list:

$ python -m yt_dlp 'https://www.arte.tv/fr/videos/104913-001-A/sous-controle-1-6/' --list-subs
[info] Available subtitles for 104913-001-A:
Language Formats
fr       vtt, vtt

This PR identifies those accessible subtitles and adds and -acc suffix to the language code.

$ python -m yt_dlp 'https://www.arte.tv/fr/videos/104913-001-A/sous-controle-1-6/' --list-subs
[info] Available subtitles for 104913-001-A:
Language Formats
fr       vtt
fr-acc   vtt

Implementation details:

I'm not familiar with this code base. I tried to add a test in the ArteTVIE._TESTS, but it seems that the test runner doesn't allow to access subtitles info (ignored here). So I've added a new test case for this extractor. Since its the first test done this way, I'm not sure this is something we want for yt_dlp.

If you have any other idea to implement this test, I would be happy to implement it.

Template

Before submitting a pull request make sure you have:

In order to be accepted and merged into yt-dlp each piece of code must be in public domain or released under Unlicense. Check all of the following options that apply:

  • [x] I am the original author of this code and I am willing to release it under Unlicense
  • [ ] I am not the original author of this code but it is in public domain or released under Unlicense (provide reliable evidence)

What is the purpose of your pull request?

Copilot Summary

🤖 Generated by Copilot at 47d2800

Summary

🎥🧪🔄

Add support for accessible subtitles in Arte videos and a test module for it. Modify yt_dlp/extractor/arte.py to identify and convert the accessible subtitle formats and return them with modified language codes. Add test/extractor/test_arte.py to test the new feature with pytest.

Sing, O Muse, of the skillful coder who devised A way to extract the subtitles for the blind and deaf From the videos of Arte, the splendid channel of the arts And tested their work with pytest, the framework of the wise

Walkthrough

  • Add a new feature to extract accessible subtitles from Arte videos (link, link, link)
  • Modify the ArteTVIE extractor class in yt_dlp/extractor/arte.py to check for subtitle formats with the suffix -MAL.m3u8 and append a -acc suffix to the language code (link, link)
  • Call the _contvert_accessible_subs_locale method in the _real_extract method to convert the subtitles before returning them (link)
  • Add a test module test/extractor/test_arte.py to test the new feature (link)
    • Define a test function test_extract_accessible_subtitles that uses the pytest framework and the parametrize decorator to test two examples of accessible subtitles (link)
    • Create an instance of the ArteTVIE extractor class and call its _contvert_accessible_subs_locale method on the original_subs parameter (link)
    • Assert that the returned dictionary of subtitles has only one key, which is the expected_locale parameter, and that the value of that key is the same as the original subtitles for the French language (link)

Nicals avatar Sep 29 '23 10:09 Nicals

You should be able to use this as a unit test:

{
    'url': 'https://www.arte.tv/fr/videos/104913-001-A/sous-controle-1-6/',
    'info_dict': {
        'id': '104913-001-A',
        'ext': 'mp4',
        'description': 'md5:ea65e21c4b9881b3ef1c333a914779da',
        'thumbnail': 'https://api-cdn.arte.tv/img/v2/image/BL5WhDp2pnXcYhQJz9A8be/940x530',
        'upload_date': '20230927',
        'timestamp': 1695783600,
        'duration': 1907,
        'title': 'Sous contrôle (1/6)',
        'subtitles': {
            'fr': 'mincount:1',
            'fr-acc': 'mincount:1',
        },
    },
}

seproDev avatar Nov 01 '23 09:11 seproDev