tinytag
tinytag copied to clipboard
Precedence with multiple tag headers
When a file has multiple sets of tags, say ID3 and FLAC, they are currently just merged, on a first-come-first-serve basis per tag (e.g. if ID3 comes first, has the artist tag set, then the artist tag from the FLAC header is ignored). Imo, it would make more sense to just use data from one of them, deciding which one of them to use either on the file format (if FLAC, prefer FLAC metadata) or completeness (use the one that's got more complete information).
This is mainly inspired by one file I found which had an ID3 header with absolutely useless information first, followed by the FLAC header with actual information; tinytag currently shows mostly the useless information, since that came first, but also includes useful information (see the test case I added in #56)
Hey @minus7
It's true that the precendence of data is something I have not thought about yet. So thank you for bringing that to my attention.
I think only taking one of the tags into consideration is not the best way to go though. I want tinytag to provide the best possible output, no matter how dirty the input. It should just work™ for any file, with no configuration required.
So instead I think it would be better to score the quality of the data and use the best data available.
E.g.
- a non-empty string is better than an empty string
- letters are better than mojibake
- track numbers should be numbers
most of the time this should result in just one format being considered, but sometimes this could provide additional data (because some meta-data formats do not support all the fields that tinytag delivers)
I think only taking one of the tags into consideration is not the best way to go though. I want tinytag to provide the best possible output, no matter how dirty the input. It should just work™ for any file, with no configuration required.
Yes, there definitely shouldn't be any configuration. I'd argue for using a single metadata format as source of information in case multiple are present; spreading useful information over different metadata formats isn't something that I'd ever expect to happen; if anything I'd expect them to contain the same data. Seeing multiple formats in a single file isn't something I'd expect to see in the first place though, but I have seen that multiple times, in each of which one of them was garbage. That's why I think just picking one may be better.
An additional thing to do would be trying to detect garbage metadata and discarding the complete metadata header containing it prior to trying to merge data.