MediaInfoLib icon indicating copy to clipboard operation
MediaInfoLib copied to clipboard

File_Mpeg_Descriptors::Get_DVB_Text does not support all possible encodings used in DVB

Open lighterowl opened this issue 2 years ago • 2 comments

The implementation of File_Mpeg_Descriptors::Get_DVB_Text, which is the central point for converting a "DVB string" representation to the internal Ztring, only supports ISO-8859-2 and calls Get_Local (which ends up using CP_ACP on Windows and ISO-8859-1 on other systems) for all other combinations of the bytes used for representing the used encoding.

Furthermore, the function processes the buffer with Get_Local if the first byte is larger or equal to 0x20, indicating that the "default encoding" should be used. Get_Local, as already noted, uses either CP_ACP or ISO-8859-1. This also incorrect, as the "default encoding" for DVB strings is IEC 6937 with the euro sign (0x20AC) instead of $ at position 0xA4.

The current mapping is described in DVB BlueBook A038r15 :

Screenshot_2023-02-17_21-42-34 Screenshot_2023-02-17_21-42-49

lighterowl avatar Feb 17 '23 20:02 lighterowl

It was clearly a quick implementation, without the support of everything. Currently not a priority for us but could be prioritized on request. Would you mind to share some sample files demonstrating this issue with MediaInfo?

JeromeMartinez avatar Feb 20 '23 08:02 JeromeMartinez

Sure, here you go (gzipped so github will accept) : tvp_rozrywka.ts.gz

When running this file with MediaInfo, the service information is incorrect w.r.t. some characters :

Menu
ID                                       : 501 (0x1F5)
Menu ID                                  : 62 (0x3E)
Format                                   : HEVC / E-AC-3 / DVB Subtitle / E-AC-3 / 
Duration                                 : 15 s 344 ms
List                                     : 502 (0x1F6) (HEVC) / 503 (0x1F7) (E-AC-3, Polish) / 506 (0x1FA) (DVB Subtitle) / 508 (0x1FC) (E-AC-3, aux) / 8005 (0x1F45) ()
Language                                 :  / Polish /  / aux
Service name                             : TVP Rozrywka
Service provider                         : Emitel
Service type                             : reserved for future use
UTC 2023-02-22 21:10:00                  : pl:Wojciech Cejrowski- boso przez úwiat - (68) Wenezuela - Boso ale w ostrogach / pl: / foreign countries/expeditions /  / 00:35:00 / 
UTC 2023-02-22 21:45:00                  : pl:Rolnik szuka ýony seria 9 - /9/ / pl: / social/spiritual sciences /  / 01:00:00 / 
UTC 2023-02-22 22:45:00                  : pl:Szansa na sukces. Opole 2023 - odc. (8) Piotr Cugowski / pl: / music/ballet/dance /  / 01:10:00 / 
UTC 2023-02-22 23:55:00                  : pl:Koùo fortuny - odc. 1441 ed. 12 / pl: / game show/quiz/contest /  / 00:40:00 / 
UTC 2023-02-26 03:05:00                  : pl:Ýycie to Kabaret - Kabaretomaniacy - (1) / pl: / variety show /  / 00:50:00 / 
UTC 2023-02-26 03:55:00                  : pl:Zakoñczenie dnia / pl: / undefined /  / 01:40:00 / 
UTC 2023-02-26 05:35:00                  : pl:Okrasa ùamie przepisy - Lekko i dietetycznie z królikiem / pl: / cooking /  / 00:35:00 / 

For example, the last event, Okrasa ùamie przepisy, should be Okrasa łamie przepisy. The descriptor for this particular event starts at offset 0x6E1069 into the file :

$ xxd -s 0x6E1069 -l 10 tvp_rozrywka.ts
006e1069: 4d3e 706f 6c39 094f 6b72                 M>pol9.Okr

The bytes are, in order :

  • 4d is the identifier of a short_event_descriptor,
  • 3e is the descriptor's length, 62 bytes,
  • 706f6c is the ISO 639 language code : pol,
  • 39 is the length of the following event_name,
  • 09 tells us that the following bytes are encoded as ISO 8859-13,
  • bytes with the actual name follow.

lighterowl avatar Feb 22 '23 14:02 lighterowl