taglib-rust icon indicating copy to clipboard operation
taglib-rust copied to clipboard

Fails with unicode in file paths.

Open mainrs opened this issue 5 years ago • 4 comments

I just ran a quick scan over my music library and noticed that two of my songs aren't picked up by taglib. It Throws an error when calling the tag() method with the message NoAvailableTag.

The files in question had the following unicode characters in their file path: Ä, ü and ß. After some googling I came across this SO post stating that taglib actually should support unicode file paths. Maybe this line needs to actually be of type wchar_t (which is basically an i32) or something similar. I do not work that often in C :smile:

Have a great weekend~

mainrs avatar Dec 01 '18 15:12 mainrs

Hi there! Can you please give some more information about which errors you get? Testing with mack, which uses taglib-rust internally, it seems to work fine with Unicode in path and Unicode in tags:

mack % cargo run -- /tmp/宇宙コンビニ
/tmp/宇宙コンビニ/月の反射でみてた/07 足跡.mp3: renamed to /tmp/宇宙コンビニ/宇宙コンビニ/月の反射でみてた/07 足跡.mp3
/tmp/宇宙コンビニ/月の反射でみてた/06 成仏してしまった男.mp3: renamed to /tmp/宇宙コンビニ/宇宙コンビニ/月の反射でみてた/06 成仏してしまった男.mp3
/tmp/宇宙コンビニ/月の反射でみてた/05 闇には祝福を.mp3: renamed to /tmp/宇宙コンビニ/宇宙コンビニ/月の反射でみてた/05 闇には祝福を.mp3
/tmp/宇宙コンビニ/月の反射でみてた/04 光の加減で話した.mp3: renamed to /tmp/宇宙コンビニ/宇宙コンビニ/月の反射でみてた/04 光の加減で話した.mp3
/tmp/宇宙コンビニ/月の反射でみてた/03 セピア色の車窓から.mp3: renamed to /tmp/宇宙コンビニ/宇宙コンビニ/月の反射でみてた/03 セピア色の車窓から.mp3
/tmp/宇宙コンビニ/月の反射でみてた/02 EverythingChanges.mp3: renamed to /tmp/宇宙コンビニ/宇宙コンビニ/月の反射でみてた/02 EverythingChanges.mp3
/tmp/宇宙コンビニ/月の反射でみてた/01 origin.mp3: renamed to /tmp/宇宙コンビニ/宇宙コンビニ/月の反射でみてた/01 origin.mp3
/tmp/宇宙コンビニ/染まる音を確認したら/07 体温.mp3: renamed to /tmp/宇宙コンビニ/宇宙コンビニ/染まる音を確認したら/07 体温.mp3
/tmp/宇宙コンビニ/染まる音を確認したら/06 裁判にかけられた男.mp3: renamed to /tmp/宇宙コンビニ/宇宙コンビニ/染まる音を確認したら/06 裁判にかけられた男.mp3
/tmp/宇宙コンビニ/染まる音を確認したら/05 strings.mp3: renamed to /tmp/宇宙コンビニ/宇宙コンビニ/染まる音を確認したら/05 strings.mp3
/tmp/宇宙コンビニ/染まる音を確認したら/04 Compass.mp3: renamed to /tmp/宇宙コンビニ/宇宙コンビニ/染まる音を確認したら/04 Compass.mp3
/tmp/宇宙コンビニ/染まる音を確認したら/03 tobira.mp3: renamed to /tmp/宇宙コンビニ/宇宙コンビニ/染まる音を確認したら/03 tobira.mp3
/tmp/宇宙コンビニ/染まる音を確認したら/02 8films.mp3: renamed to /tmp/宇宙コンビニ/宇宙コンビニ/染まる音を確認したら/02 8films.mp3
/tmp/宇宙コンビニ/染まる音を確認したら/01 Pyramid.mp3: renamed to /tmp/宇宙コンビニ/宇宙コンビニ/染まる音を確認したら/01 Pyramid.mp3

My best guess it that there is genuinely a problem with your tags.

cdown avatar Dec 01 '18 16:12 cdown

Hey there, thanks for answering!

I made sure the tags are properly applied. I changed them once in iTunes and after it didn't work there, I updated them using Mp3Tag. My two files that do not work are the only ones that do contain german characters. The paths are: Z:\\itunes\\iTunes Media\\Music\\Itchy\\All We Know Remixes\\01 Stuck With The Devil (Großstadtge.m4a -> ß

Z:\\itunes\\iTunes Media\\Music\\Itchy\\All We Know Remixes\\04 Black (Äh Dings Remix).m4a -> Ä

After I replaced the two letters with normal ascii chars, they got recognized. Running ffmpeg -i on both files returns metadata entries:

Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/z/itunes/iTunes Media/Music/Itchy/All We Know Remixes/01 Stuck With The Devil (Großstadtge.m4a':
  Metadata:
    major_brand     : M4A
    minor_version   : 0
    compatible_brands: M4A mp42isom
    creation_time   : 2018-10-19T00:34:37.000000Z
    title           : Stuck With The Devil (Großstadtgefluester Remix)
    artist          : Itchy
    album           : All We Know Remixes
    genre           : Punk
    track           : 1/6
    disc            : 1/1
    date            : 2017
    compilation     : 0
    gapless_playback: 0
    encoder         : iTunes 12.9.0.167
    Encoding Params : vers
    iTunNORM        :  00000EBE 00000EE6 000068D7 00006BAD 0002A9FE 0002A9FE 00007BAA 00007BAA 0000488F 0000DBF4
    iTunes_CDDB_IDs : 6++
    UFIDhttp://www.cddb.com/id3/taginfo1.html: 䌳㍄ㅎ儷㈷㘸㔸㐷嘷㠴䐴㤱䉄ぁ䌶䑅㉅㤰ㅄ㈳䕄〲㉅䕁ぃぃ㕐
    lyrics          : Test
  Duration: 00:04:09.25, start: 0.000000, bitrate: 957 kb/s
    Stream #0:0(und): Audio: alac (alac / 0x63616C61), 44100 Hz, stereo, s16p, 953 kb/s (default)
    Metadata:
      creation_time   : 2018-10-19T00:34:37.000000Z
    Stream #0:1: Video: mjpeg, yuvj444p(pc, bt470bg/unknown/unknown), 600x600 [SAR 300:300 DAR 1:1], 90k tbr, 90k tbn, 90k tbc
Input #0, mov,mp4,m4a,3gp,3g2,mj2, from '/mnt/z/itunes/iTunes Media/Music/Itchy/All We Know Remixes/04 Black (Äh Dings Remix).m4a':
  Metadata:
    major_brand     : M4A
    minor_version   : 0
    compatible_brands: M4A mp42isom
    creation_time   : 2018-10-19T00:35:47.000000Z
    title           : Black (Aeh Dings Remix)
    artist          : Itchy
    album           : All We Know Remixes
    genre           : Punk
    track           : 4/6
    disc            : 1/1
    date            : 2017
    compilation     : 0
    gapless_playback: 0
    encoder         : iTunes 12.9.0.167
    Encoding Params : vers
    iTunNORM        :  0000058D 000005B8 00003933 0000392D 000268DE 000268DE 00007BAA 00007BAA 0000C45E 000074D3
    iTunes_CDDB_IDs : 6++
    UFIDhttp://www.cddb.com/id3/taginfo1.html: 䌳㍄㉎儷㈷㘸㔸㔷嘰㠹㍅㈲䕁㠵䌷䕆㈸㈵〶䔴ぃ㠶㘵䌶ㅄぁ㍐
    lyrics          : Test
  Duration: 00:03:13.03, start: 0.000000, bitrate: 778 kb/s
    Stream #0:0(und): Audio: alac (alac / 0x63616C61), 44100 Hz, stereo, s16p, 774 kb/s (default)
    Metadata:
      creation_time   : 2018-10-19T00:35:47.000000Z
    Stream #0:1: Video: mjpeg, yuvj444p(pc, bt470bg/unknown/unknown), 600x600 [SAR 300:300 DAR 1:1], 90k tbr, 90k tbn, 90k tbc

As you seem to have developed mack, maybe I am just doing something wrong.

use std::path::{Path, PathBuf};
use walkdir::{WalkDir, DirEntry};

pub struct Track {
	pub name: String,
	pub artist: String,
	pub album: String,
	pub year: u32,

	pub path: PathBuf,
	pub audio_properties: Option<AudioProperties>,
}

pub struct AudioProperties {
	pub bitrate: i32,
	pub vbr: bool,
}

pub fn collect<'a, P: AsRef<Path>>(path: P) -> Vec<Track> {
	let mut tracks = Vec::default();

	for entry in WalkDir::new(path).into_iter().filter_map(|e| e.ok()) {
		let entry: DirEntry = entry;
		if entry.file_type().is_dir() {
			continue;
		}

		let path = entry.path();

		let file = match taglib::File::new(&path) {
			Ok(f) => f,
			Err(e) => {
				eprintln!("Failed to open file {:?}", e);
				continue;
			},
		};

		match file.tag() {
			Ok(t) => {
				let track = Track {
					name: t.title().unwrap_or_default(),
					artist: t.artist().unwrap_or_default(),
					album: t.album().unwrap_or_default(),
					year: t.year().unwrap_or_default(),

					path: PathBuf::from(&path),
					audio_properties: None,
				};

				tracks.push(track);
			},
			Err(e) => {
				// This part is reached twice, once for each file. The error printed out is NoAvailableTag
				println!("No available tags for {:?}", &path);
				eprintln!("{:?}", e);
			}
		}
	}

	return tracks;
}

Edit 1: @cdown after some more testing (which means copying my japanese songs into the folder and trying again) it seems that the library doesn't have problems at all finding those as it recognized titles like 第005回「ゲームの未来はWiiか?」その2.mp3. I am not sure if my tags are just wrong or somehow damaged as I edited them now twice already. But the fact that it does open up a file like the one above is suggesting it to be honest. Do m4a containers support unicode titles? I already supplemented them with the old writing (which is for example changing ß into ss and Ä into ae) but they still do return NoAvailableTags.

The thing that speaks against this is that just by changing the file name, the file gets opened up and taglib does recognize the tags and my rust code above doesn't print the no available tags error.

Edit 2: Just ran mack over the album that those two files are part of and even mack told me the following:

cannot fix D:/seiri/Itchy\All We Know Remixes\1-01 Stuck With The Devil (Großstadtgefluester Remix).m4a: Tag(NoAvailableTag)
cannot rename D:/seiri/Itchy\All We Know Remixes\1-01 Stuck With The Devil (Großstadtgefluester Remix).m4a: Tag(NoAvailableTag)

Edit 3: While I was trying to figure out how to determine the route cause of the error I came across seiri, a music library manager written in Rust. The author uses custom bindings to taglib2 and not this crate. After running the program and adding my songs, seiri actually recognized them and added them to the library.

scsh scsc1

Based on that I would just assume that the bindings do something wrong. This is the crate that the author wrote himself. Looks like both libraries handle the path string creation differently. taglib_rust does it this way, while seiri is doing it like this, basicallty converting the strings to a c_str and then getting the raw bytes form it. Maybe using std::os::raw::c_char instead of libc::c_char does make a difference...

mainrs avatar Dec 01 '18 17:12 mainrs

Hello :) I was wondering if there is any news regarding this issue? I just ran into the same thing.

It works fine if I remove the offending character (Ä and some other accented ones) from the filename (while leaving them in the tags).

I also tried katatsuki, the library behind seiri which was mentioned above, and that one did work.

markvandieren avatar Dec 24 '18 16:12 markvandieren

As far as I am concerned, I am not that familiar with C/++ and their internals so I am not sure as to why this bug happens. I just switched to the library used by seiri which I had no problems so far with.

mainrs avatar Dec 24 '18 20:12 mainrs