Hello, I noticed some issues with the media-rss implementation. Before trying to fix them, I would like to discuss it here.

media:group is ignored

According to the Media-RSS specification, the <media:group> tag is used to group several links/representation for a same media. However, my understanding is that feedparser just ignores this tag, and consider every <media:content> as a new media.

It allows grouping of media:content elements that are effectively the same content, yet different representations. For instance: the same song recorded in both the WAV and MP3 format. It's an optional element that must only be used for this purpose.

https://github.com/kurtmckee/feedparser/blob/d12d3bdd075bca71885ccb02e9b08ac04fcb8514/feedparser/namespaces/mediarss.py#L64-L66 https://github.com/kurtmckee/feedparser/blob/d12d3bdd075bca71885ccb02e9b08ac04fcb8514/feedparser/namespaces/mediarss.py#L119-L122

The description is set on the feed entry

The <media:description> tag belongs to the media, but feedparser updates the feed entry description.

https://github.com/kurtmckee/feedparser/blob/d12d3bdd075bca71885ccb02e9b08ac04fcb8514/feedparser/namespaces/mediarss.py#L91-L95

Some tags are missing

For instance, the <media:subtitle> tag is not handled by feedparser.

Attributes are ignored

When tags are handled, a lot of the attributes in the Media-RSS specification are just ignored. For instance, <media:description> can either be plain text or html but feedreader does not make a difference.

So...

I would like to tackle this issues, but there could be some backward compatibility problems. How can I manage this? I believe Media-RSS is not much used, and the simpler option for me is just to break the compatibility so feedparser can correctly respect the specification. What do you think?

Nov 16 '19 20:11 azmeuk

Could you please give us a short description about what MediaRSS is for. Maybe a real use case would improve the understanding.

Dec 11 '19 16:12 buhtz

Of course. Media-RSS is used to describe medias, such as audio or video files, and their metadata (thumbnails, description, number of views/listening, rating, links to read the media in different format etc.)

It is used in every youtube feeds (example) or peertube feeds (example though support should improve in an upcoming version).

Dec 11 '19 16:12 azmeuk

I have the same issue , did you solve it?

Jan 09 '20 08:01 chaimae26

Actually this would take some time to fix. I am willing to do a patch, but I would like to be sure that it will merged in the end before I start.

@kurtmckee What do you think?

Jan 09 '20 09:01 azmeuk

This is something we are very interested in as well, especially when it comes to children in media:content, such as media:title (i.e. associating e.g. image titles with the images themselves).

I have started work on a patch but the changes are breaking at this time (see example below).

Main changes:

media:group (not part of below example) and media:content are now containers as expected. media:group may contain media:contents.
media:{x} now generates media_{x} keys instead of {x} keys. The keys previously known as media_{x} are now known as media_{x}_details (this is mainly to make tags distinguishable from attributes of the parent media:{x})
media:title is no longer used as a fallback for a missing title (consequence of 2. above. Fixable but probably violating expectations?)

Any thoughts on these changes and how they affect the parsed data?

@azmeuk Is this in line with what you had in mind or were you planning on something different?

@kurtmckee Is this in line with the project as a whole?

Input file

<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/"
     xmlns:dcterms="http://purl.org/dc/terms/">

  <channel>
    <title>Music Videos 101</title>
    <link>http://www.foo.com</link>
    <description>Discussions of great videos</description>
    <item>
      <title>The latest video from an artist</title>
      <link>http://www.foo.com/item1.htm</link>
      <media:content url="http://www.foo.com/movie.mov" fileSize="12216320" type="video/quicktime" expression="full">
        <media:player url="http://www.foo.com/player?id=1111" height="200" width="400" />
        <media:hash algo="md5">dfdec888b72151965a34b4b59031290a</media:hash>
        <media:credit role="producer">producer's name</media:credit>
        <media:credit role="artist">artist's name</media:credit>
        <media:category scheme="http://blah.com/scheme">
          music/artistname/album/song
        </media:category>
        <media:text type="plain">
          Oh, say, can you see, by the dawn's early light
        </media:text>
        <media:rating>nonadult</media:rating>
        <dcterms:valid>
          start=2002-10-13T09:00+01:00;
          end=2002-10-17T17:00+01:00;
          scheme=W3C-DTF
        </dcterms:valid>
      </media:content>
    </item>
  </channel>
</rss>

Parsed data WITHOUT changes

[
  {
    "title": "The latest video from an artist",
    "title_detail": {
      "type": "text/plain",
      "language": null,
      "base": "",
      "value": "The latest video from an artist"
    },
    "links": [
      {
        "rel": "alternate",
        "type": "text/html",
        "href": "http://www.foo.com/item1.htm"
      }
    ],
    "link": "http://www.foo.com/item1.htm",
    "media_content": [
      {
        "url": "http://www.foo.com/movie.mov",
        "filesize": "12216320",
        "type": "video/quicktime",
        "expression": "full"
      }
    ],
    "media_player": {
      "url": "http://www.foo.com/player?id=1111",
      "height": "200",
      "width": "400",
      "content": ""
    },
    "media_hash": {
      "algo": "md5"
    },
    "media_credit": [
      {
        "role": "producer",
        "content": "producer's name"
      },
      {
        "role": "artist",
        "content": "artist's name"
      }
    ],
    "credit": "artist's name",
    "tags": [
      {
        "term": "music/artistname/album/song",
        "scheme": "http://blah.com/scheme",
        "label": null
      }
    ],
    "media_text": {
      "type": "plain"
    },
    "media_rating": {
      "content": "nonadult"
    },
    "rating": "nonadult",
    "validity": "start=2002-10-13T09:00+01:00;\n          end=2002-10-17T17:00+01:00;\n          scheme=W3C-DTF",
    "validity_start": "2002-10-13T09:00+01:00",
    "validity_start_parsed": [
      2002,
      10,
      13,
      8,
      0,
      0,
      6,
      286,
      0
    ]
  }
]

Parsed data WITH changes

[
  {
    "title": "The latest video from an artist",
    "title_detail": {
      "type": "text/plain",
      "language": null,
      "base": "",
      "value": "The latest video from an artist"
    },
    "links": [
      {
        "rel": "alternate",
        "type": "text/html",
        "href": "http://www.foo.com/item1.htm"
      }
    ],
    "link": "http://www.foo.com/item1.htm",
    "media_content": [
      {
        "url": "http://www.foo.com/movie.mov",
        "filesize": "12216320",
        "type": "video/quicktime",
        "expression": "full",
        "media_player": {
          "url": "http://www.foo.com/player?id=1111",
          "height": "200",
          "width": "400",
          "content": ""
        },
        "media_hash": {
          "algo": "md5"
        },
        "media_credit_details": [
          {
            "role": "producer",
            "content": "producer's name"
          },
          {
            "role": "artist",
            "content": "artist's name"
          }
        ],
        "media_credit": "artist's name",
        "tags": [
          {
            "term": "music/artistname/album/song",
            "scheme": "http://blah.com/scheme",
            "label": null
          }
        ],
        "media_text": {
          "type": "plain"
        },
        "media_rating_details": {
          "content": "nonadult"
        },
        "media_rating": "nonadult",
        "validity": "start=2002-10-13T09:00+01:00;\n          end=2002-10-17T17:00+01:00;\n          scheme=W3C-DTF",
        "validity_start": "2002-10-13T09:00+01:00",
        "validity_start_parsed": [
          2002,
          10,
          13,
          8,
          0,
          0,
          6,
          286,
          0
        ]
      }
    ]
  }
]

Output diff

...
    "media_content": [
      {
        "url": "http://www.foo.com/movie.mov",
        "filesize": "12216320",
        "type": "video/quicktime",
-        "expression": "full"
-      }
-    ],
+        "expression": "full",
        "media_player": {
          "url": "http://www.foo.com/player?id=1111",
          "height": "200",
...
-    "media_credit": [
+       "media_credit_details": [
          {
            "role": "producer",
            "content": "producer's name"
          },
          {
            "role": "artist",
            "content": "artist's name"
          }
        ],
-    "credit": "artist's name",
+        "media_credit": "artist's name",
...
        "media_text": {
          "type": "plain"
        },
-    "media_rating": {
+        "media_rating_details": {
          "content": "nonadult"
        },
-    "rating": "nonadult",
+        "media_rating": "nonadult",
        "validity": "start=2002-10-13T09:00+01:00;\n          end=2002-10-17T17:00+01:00;\n          scheme=W3C-DTF",
...
+  }
+]

Feb 21 '20 15:02 o-felixz

feedparser
feedparser copied to clipboard

Issues with the Media-RSS implementation

media:group is ignored

The description is set on the feed entry

Some tags are missing

Attributes are ignored

So...

feedparser feedparser copied to clipboard

Issues with the Media-RSS implementation

media:group is ignored

The description is set on the feed entry

Some tags are missing

Attributes are ignored

So...

feedparser
feedparser copied to clipboard