[Bug]: 🐛 Due to the music id3 information coding, the music information obtained is garbled.
Read the Troubleshooting guide.
- [X] I have read and followed the Troubleshooting guide
Reproduction steps
- Download id3_gb2312_encoding.tar.gz decompress to the music folder path to be scanned
- Execute
php artisan koel:scan -F -nscan music - Check the music list and the following garbled information is displayed.
Expected behavior
users expect accurate music information.
Actual behavior
because id3 is encoded by GB2312 music information, resulting in the display of garbled code.
Logs
null
Koel version
v7.0.11
How did you install Koel?
Compiled from source
Additional information
I found that people from different countries have encountered the same problems, such as https://github.com/koel/koel/issues/646
Users' manual conversion of music id3 encoding may indeed be one of the solutions, but I wonder if we can have a more friendly solution?
I tried to check the upstream https://github.com/JamesHeinrich/getID3 code and understand the logic code related to koel scanning music. At present, I can only say that I have made a simple adaptation to the GB2312 encoding ( WIP I will submit this part of the code in my code warehouse later ). Make it work properly, but in fact, I originally intended to make it support more coding methods to benefit players all over the world.
Please forgive me for knowing too little about coding, so I want to try to ask the big guy if he has any better ideas or ideas so that we can continue to work on this matter.
Finally, I would like to thank @phanan for its continuous update and maintenance work. It is you who make this thing (the world) interesting.
Indeed, text encoding is always a headache to deal with, partly because there are so many languages and edge cases. I think Koel attempts to detect the encoding and converts the tags to UTF-8, but I may be wrong (AFK right now).
Indeed, text encoding is always a headache to deal with, partly because there are so many languages and edge cases. I think Koel attempts to detect the encoding and converts the tags to UTF-8, but I may be wrong (AFK right now).
@phanan Yes, I have noticed this problem. The current default parameter of the encoding tag returned by getID3 is always ISO-8859-1 (code location <https://github.com/JamesHeinrich/getID3/blob/master/geti d3/getid3.php#L96>), I also tried to configure $encoding_id3v1_autodetect = true in this location https://github.com/JamesHeinric h/getID3/blob/master/getid3/getid3.php#L103 But the result returned in the music file (id3_gb2312_encoding.tar.gz) above is Windows-1251 (in fact, the correct result should be GB2312 or EUC-CN)
Yesterday, I tried it Arr::get($info, 'id3v2.title', null) [code#72](https://github.com/PBK-B/koel/blob/f2c6bc6a 98561dffcb5290c98127fc9dc72f94cd/app/Values/SongScanInformation.php#L72) get string bytes use mb_detect_ The result obtained by encoding($title, mb_list_encodings(), false) is GB18030. But I'm not sure if it's applicable to other coding sets (maybe we can collect some music files of other coding set information for testing?). The test code is roughly as follows:
…
public static function fromGetId3Info(array $info, string $path): self
{
// dealing with GB2312 character encoding problems
$raw_tags = array_merge(
Arr::get($info, 'id3v1', []),
Arr::get($info, 'id3v2', []),
Arr::get($comments, 'id3v2', [])
);
Log::debug(var_export($raw_tags['title'], true));
Log::debug(var_export(mb_detect_encoding($raw_tags['title'], mb_list_encodings(), true), true));
}
…
I'm wondering if we should send an issue upstream.
References
https://www.php.net/manual/zh/function.mb-detect-encoding.php https://www.php.net/manual/en/function.mb-list-encodings.php
What we can do without having to rely on getID3 is to check the encoding ourselves using PHP’s encoding detection functions (gotta admit I’m not sure how getID3 does on its side) and do the conversion when necessary.
On Sat, Aug 31, 2024 at 12:59 PBK Bin @.***> wrote:
Indeed, text encoding is always a headache to deal with, partly because there are so many languages and edge cases. I think Koel attempts to detect the encoding and converts the tags to UTF-8, but I may be wrong (AFK right now).
@phanan https://github.com/phanan Yes, I have noticed this problem. The current default parameter of the encoding tag returned by getID3 is always ISO-8859-1 (code location < https://github.com/JamesHeinrich/getID3/blob/master/geti d3/getid3.php#L96>), I also tried to configure $encoding_id3v1_autodetect = true in this location https://github.com/JamesHeinric h/getID3/blob/master/getid3/getid3.php#L103 But the result returned in the music file (id3_gb2312_encoding.tar.gz) above is Windows-1251 (in fact, the correct result should be GB2312 or EUC-CN)
Yesterday, I tried it Arr::get($info, 'id3v2.title', null) [code#72]( https://github.com/PBK-B/koel/blob/f2c6bc6a 98561dffcb5290c98127fc9dc72f94cd/app/Values/SongScanInformation.php#L72) get string bytes use mb_detect_ The result obtained by encoding($title, mb_list_encodings(), false) is GB18030. But I'm not sure if it's applicable to other coding sets (maybe we can collect some music files of other coding set information for testing?). The test code is roughly as follows:
… public static function fromGetId3Info(array $info, string $path): self { // dealing with GB2312 character encoding problems $raw_tags = array_merge( Arr::get($info, 'id3v1', []), Arr::get($info, 'id3v2', []), Arr::get($comments, 'id3v2', []) ); Log::debug(var_export($raw_tags['title'], true)); Log::debug(var_export(mb_detect_encoding($raw_tags['title'], mb_list_encodings(), true), true)); } …
I'm wondering if we should send an issue upstream. References
https://www.php.net/manual/zh/function.mb-detect-encoding.php https://www.php.net/manual/en/function.mb-list-encodings.php
— Reply to this email directly, view it on GitHub https://github.com/koel/koel/issues/1816#issuecomment-2322861129, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5O3UTTQ64H5G7CKXHM433ZUGOYPAVCNFSM6AAAAABNNWWNDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRSHA3DCMJSHE . You are receiving this because you were mentioned.Message ID: @.***>
Ok, maybe there is something wrong with what I described. In the koel code, the parameter array $info of the fromGetId3Info function comes from $this->getID3->analyze($this->filePath); the code is here app/Services/FileScanner.php#L60
you do your own thing first. When you have time, we can take a look at the logic of this piece together. It's a pleasure to cooperate with you.
I mean Koel uses getID3 to retrieve the tags, but we can take one further step when it comes to encoding detection and conversion.
On Sat, Aug 31, 2024 at 13:20 PBK Bin @.***> wrote:
Ok, maybe there is something wrong with what I described. In the koel code, the parameter array $info of the fromGetId3Info function comes from $this->getID3->analyze($this->filePath); the code is here app/Services/FileScanner.php#L60 https://github.com/koel/koel/blob/master/app/Services/FileScanner.php#L60
you do your own thing first. When you have time, we can take a look at the logic of this piece together. It's a pleasure to cooperate with you.
— Reply to this email directly, view it on GitHub https://github.com/koel/koel/issues/1816#issuecomment-2322866410, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5O3URE44OPYHGQEW6AZXDZUGRIPAVCNFSM6AAAAABNNWWNDSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGMRSHA3DMNBRGA . You are receiving this because you were mentioned.Message ID: @.***>
Hi, I noticed the issue with garbled music information is likely caused by the inconsistent character encoding in MP3 ID3 tags. Since MP3 files don’t have a unified standard for encoding, and often lack explicit fields indicating which encoding is used, it can lead to misinterpretation of text metadata.
One possible workaround is to convert the MP3 files into a format with more consistent and standardized character encoding, such as M4A. This could help ensure that metadata is correctly read and displayed without garbling.
Just a suggestion—hope it helps!