lofty-rs
lofty-rs copied to clipboard
Support EBML (Matroska, WebM) files
- [x] Audio property reading
- [x] Tag parsing
- [x] Tag encoding
- [x] Writing tags to files
~~Everything's in place now. Just need to actually write the tag into the file.~~
Still need to figure out:
- How to handle cover art?
- Pictures are just
AttachedFiles, which is similar to APE, but there doesn't seem to be a "standard" for how to name the files to separate them from each other. For example, in APE there'sCover Art (Front),Cover Art (Back), etc. The only way I've seen covers disambiguated is with the namecover.{ext}(e.g.cover.jpg), which would limit us to only supporting front covers. AttachedFiles can carry more information with them (uid,referral) than any other format. May not be able to convert them toPictures if any of those extra fields are used.
- Pictures are just
- How are tags even used in the wild??
- I'm not sure how many applications actually write well-formed tags. See https://github.com/Serial-ATA/lofty-rs/pull/218#issuecomment-2444659483 for example.
- I haven't seen many applications that use more than the basic title/artist tags for Matroska, need to do more research.
- Should track titles be pulled from the
\Segmenttitle?- Titles should be written as a
SimpleTagofTarget = Track; TITLE = foo, but it looks like ffmpeg doesn't do that, and instead writes it as the title of\Segment.
- Titles should be written as a
closes #141
Would you be interested in any help getting this one wrapped up?
This is useful because it looks like webm is actually now a recommended cross platform audio format: https://github.com/goldfire/howler.js?tab=readme-ov-file#format-recommendations
@milesegan I'll put out 0.21.0 today and pick this back up for 0.22.0. It is something I'd like to start working on again.
@Serial-ATA That would be fantastic. My Rust isn't so strong but happy to help out if I can.
@milesegan I finally got it to a state where it can read and convert Matroska files into TaggedFile. Still need to handle the generic -> concrete conversion as well as writing. If you have any Matroska files, could you try them out on this branch and let me know how it goes?
currently trying e37fe441f6cafcd66dce1b1618106348deab3c21, instantly with a error:
Compiling lofty v0.21.1 (https://github.com/Serial-ATA/lofty-rs.git?branch=matroska#e37fe441)
error[E0308]: mismatched types
--> /home/hasezoey/.local/cargo/git/checkouts/lofty-rs-f5e48f8219b271cf/e37fe44/lofty/src/ebml/read/segment_tags.rs:218:3
|
218 | language,
| ^^^^^^^^ expected `Language`, found `Option<Language>`
|
= note: expected enum `ebml::tag::simple_tag::Language`
found enum `Option<ebml::tag::simple_tag::Language>`
help: consider using `Option::expect` to unwrap the `Option<ebml::tag::simple_tag::Language>` value, panicking if the value is an `Option::None`
|
218 | language: language.expect("REASON"),
| +++++++++++++++++++++++++++
For more information about this error, try `rustc --explain E0308`.
Thanks for trying it out, forgot to include a file in my last commit. Fixed.
aside from that error, when manually fixing it (using defaults), the current implementation only seems to be able to parse AlbumArtist:
TYPE: Matroska
TAGS: [
TagItem {
lang: [
117,
110,
100,
],
description: "",
item_key: AlbumArtist,
item_value: Text(
"Alan Walker",
),
},
]
ffmpeg reports a lot more:
Input #0, matroska,webm, from '01 Alan Walker - Jump Start.mka':
Metadata:
title : Jump Start
DATE : 2021-09-09
ARTIST : Alan Walker
track : 1/6
ALBUM : Walker Racing League
DISC : 1/1
PUBLISHER : MER
MUSICBRAINZ_RELEASE_GROUP_ID: 21b7c435-a015-4a0f-b89d-b9429cb8a39c
TORY : 2021
MUSICBRAINZ_RELEASE_TRACK_ID: 044c0619-c3e4-4e06-b87f-5a2503571d57
ALBUM_ARTIST : Alan Walker
TSO2 : Walker, Alan
ARTIST-SORT : Walker, Alan
TSRC : NOG842114010
SCRIPT : Latn
TMED : Digital Media
ORIGINALYEAR : 2021
ARTISTS : Alan Walker
BARCODE : 886449545476
MUSICBRAINZ_ALBUM_TYPE: ep
MUSICBRAINZ_ALBUM_STATUS: official
PURL : https://www.youtube.com/watch?v=3wIxNyH8NUA
COMMENT : https://www.youtube.com/watch?v=3wIxNyH8NUA
MUSICBRAINZ_ALBUM_ID: 9e65a340-778b-4e82-92d3-8dbc9d823d5b
MUSICBRAINZ_ARTIST_ID: b0e4bc50-3062-4d31-afad-def6a6b7a8e9
MUSICBRAINZ_ALBUM_ARTIST_ID: b0e4bc50-3062-4d31-afad-def6a6b7a8e9
ENCODER : Lavf60.16.100
Duration: 00:01:29.60, start: 0.000000, bitrate: 108 kb/s
Stream #0:0: Audio: vorbis, 48000 Hz, stereo, fltp
Metadata:
ENCODER : Lavc60.31.102 libvorbis
DURATION : 00:01:29.599000000
code used:
if let Ok(mut tagged_file) = probe.read() {
debug!("FILE: {:#?}", path);
for tag in tagged_file.tags() {
debug!("TYPE: {:#?}", tag.tag_type());
debug!("TAGS: {:#?}", tag.items().collect::<Vec<_>>());
}
}
Nice, thanks! Looks like I got the targets for some of these items wrong. Could you email the file to serial@[DOMAIN ON MY PROFILE]?
Could you email the file to
send, hopefully i got the correct address, it should have been fine to be posted here (i modified it to not have any audio), but just to be safe i emailed it.
Yep, I got it.
it should have been fine to be posted here (i modified it to not have any audio)
Yeah, that'd be fine to send here. Normally people don't remove the audio though, so email is the safe default.
Was this file made with FFmpeg converting an MP3 to MKA? It looks like it's not using tag targets correctly. The ISRC and MusicBrainz track ID are on the album level, for example.
Was this file made with FFmpeg converting an MP3 to MKA? It looks like it's not using tag targets correctly. The ISRC and MusicBrainz track ID are on the album level, for example.
yes it was made from converting a mp3 to a mka via ffmpeg (-map_metadata), no other specific options, so could be a ffmpeg issue
I read a bit more about how matroska want tags to be handled, like it wants TargetType = ALBUM; TITLE = Album title and TargetType = Track; TITLE = Track title instead of things like TargetType = ALBUM; ALBUM = Album Title, which ffmpeg currently outputs.
This seems to be a ffmpeg problem which currently does not have fix, and the only related patch series i could find seems to not be merged.
FFmpeg has its internal tag names like album (similar to loftys AlbumTitle), which it currently always translates to the custom matroska tag ALBUM, see ffmpeg -f lavfi -i anullsrc -t 1 -metadata album="test album" test.mka (ffmepg n7.0.2).
Note: there seems to be a related discussion (the same issue as with the file i provided) in the mp3tag project Edit: maybe related ffmpeg issue
So maybe consider adding those custom tags as a "relaxed" rule and when writing do it like matroska actually wants?
aside from that, i also just noticed that lofty did not parse the Segment information's title yet:
$ mkvinfo out.mka
...
+ Segment: size 5730
|+ Seek head (subentries will be skipped)
|+ EBML void: size 83
|+ Segment information
| + Timestamp scale: 1000000
| + Title: Jump Start
...
PS: ffmpeg will currently always translate a TITLE tag to the segment's title and not output a TITLE tag when muxing to matroska
Thanks a ton for looking into this.
So maybe consider adding those custom tags as a "relaxed" rule and when writing do it like matroska actually wants?
Hm, that does really complicate things.
In the case of your file:
- It has both
Target = Album; ARTIST = Alan WalkerandTarget = Album; ALBUM_ARTIST = Alan Walker. There's no way to safely know thatARTISTcan be the track artist. - There's a
DATEtag, but what kind of date? - The disc and track numbers are stored as
current/total, which isn't right. - It has a bunch of untranslated ID3v2 tags.
The only safe assumption to make in that case is that ALBUM_ARTIST should just be replaced with ARTIST at the album level.
PS: ffmpeg will currently always translate a TITLE tag to the segment's title and not output a TITLE tag when muxing to matroska
Odd. mp3tag just deletes the segment title and puts it in the tags, which is an option I guess. FFmpeg seems to be able to pick up that title. It does run the risk of deleting someone's segment name unexpectedly, though.
This is a pretty unfortunate situation since I imagine FFmpeg isn't the only software with these kinds of issues (I don't blame them, this format is super over-engineered). Trying to support even some of these weird cases will really complicate things. I wonder if it's even worth it, or if Lofty should just expect closer-to-spec-compliant inputs, and ignoring all these junk tags.
Hm, that does really complicate things.
I dont know how much logic you want to put into this, but maybe if Album/ALBUM_ARTIST exists and Track/Artist does not exist, assume Album/Artist is the track artist? The same for title?
As for the other tags like DATE or disc and tracks, i dont know what the best option would be.
In any case, if this really is this complicated, just focus on spec-compliancy with the initial PR and lets put in issues / PRs in the future to add workarounds.
Regardless of workarounds applied, shouldnt those tags show up as custom tags (similar to id3's TXX) in Lofty?
Regardless of workarounds applied, shouldnt those tags show up as custom tags (similar to id3's TXX) in Lofty?
Yep, Matroska lets you put whatever tags you want outside of their official ones. Any software outside of FFmpeg won't be able to understand it, though.