e2openplugin-EnhancedMovieCenter icon indicating copy to clipboard operation
e2openplugin-EnhancedMovieCenter copied to clipboard

Garbage text shown in recording information

Open Tyri0n opened this issue 3 years ago • 1 comments

Just started using EMC and half of my recordings' info is displaying garbage. The text is displayed correctly in the default video playlist, and elsewhere before recording.

Looks like the issue is here:

20220113 23:25 [META] Detected short_event encoding-type: windows-1253 (0.330670671343) 20220113 23:25 [META] Exception in readEitFile: 'charmap' codec can't decode byte 0x9f in position 15: character maps to Traceback (most recent call last): File "/usr/lib/enigma2/python/Plugins/Extensions/EnhancedMovieCenter/EitSupport.py", line 416, in __readEitFile File "/usr/lib/python2.7/encodings/cp1253.py", line 15, in decode UnicodeDecodeError: 'charmap' codec can't decode byte 0x9f in position 15: character maps to

Tyri0n avatar Jan 13 '22 23:01 Tyri0n

I think this is caused by EPGCache which provides descriptions for recordings.

But I also think that data from sources outside EMC should be sanitised by EMC upon ingestion.

wedebe avatar Jul 25 '22 22:07 wedebe

@Tyri0n Could you please attach the eit file here, if you still have this problem?

pjsharp avatar Dec 18 '22 19:12 pjsharp

Sample attached (if this is indeed the same problem): (see also https://github.com/openatv/enigma2/issues/1954)

Archive-with_movie.zip Archive-without_movie.zip

1_0_1_207A_808_2_11A0000_0_0_0_20221218194603

wedebe avatar Dec 18 '22 19:12 wedebe

I went back to the standard movie player soon after. The reply by @wedebe is what I was seeing, if I remember correctly.

Tyri0n avatar Dec 18 '22 20:12 Tyri0n

A few days ago I opened a pull request that should solve these problems in EMC. It's still waiting to be merged.

encoding2

pjsharp avatar Dec 18 '22 20:12 pjsharp

@pjsharp, I've manually uploaded the updated files in your PR to my receiver - looks good :) Thanks a mill!

Does this mean that EPGCache wasn't actually at fault or what exactly was going on?

4097_0_2_0_0_0_0_0_0_0_20221219135719

wedebe avatar Dec 19 '22 14:12 wedebe

@Tyri0n, I've manually uploaded the updated files in your PR to my receiver - looks good :) Thanks a mill!

Does this mean that EPGCache wasn't actually at fault or what exactly was going on?

4097_0_2_0_0_0_0_0_0_0_20221219135719

guessing this was meant for @pjsharp :)

Tyri0n avatar Dec 19 '22 14:12 Tyri0n

Sorry Tyri0n, yes it was 🙈

wedebe avatar Dec 19 '22 14:12 wedebe

The eit-Parser of EMC did not support some encodings. I've added a few encoding and fixed some parsing errors I noticed.

#298

pjsharp avatar Dec 19 '22 15:12 pjsharp

@wedebe I'm pretty sure that oatv epgcache (or whatever code) is at fault. I have same garbage (also recording 28.2e). I can make 2 short recordings from same program and 1 has garbage chars in eit file and the other is normal... It's great that pjsharp has added the decoding I'm emc, but it will make it less likely that oatv will try fixing the underlying problem..

dazulrich avatar Dec 22 '22 22:12 dazulrich

@dazulrich wouldn't I see garbage in the standard oatv movie player, in that case?

Tyri0n avatar Dec 22 '22 22:12 Tyri0n

@dazulrich Do you still have these 2 recordings from the same program? Or would it be possible to reproduce it again? I would like to inspect the eit files.

pjsharp avatar Dec 22 '22 23:12 pjsharp

@dazulrich wouldn't I see garbage in the standard oatv movie player, in that case?

Not if the standard oatv movie player has a workaround in it.

wedebe avatar Dec 22 '22 23:12 wedebe

It's great that pjsharp has added the decoding I'm emc, but it will make it less likely that oatv will try fixing the underlying problem..

I raised this exact same concern with a workaround that got added to a different plugin.

Although, self-conflictingly... I also believe that any data consumer should never blindly trust an external data provider, so any potential errors should be handled gracefully.

But, then againnnnnn... each individual workaround is another element added to a house of cards lined up for crapping itself if/when the original cause of such a problem is fixed at source 🙈

wedebe avatar Dec 22 '22 23:12 wedebe

@dazulrich Do you still have these 2 recordings from the same program? Or would it be possible to reproduce it again? I would like to inspect the eit files.

I'd observed this too, but could never figure out a pattern. I'm sure I've got plenty of samples, just not available to me this week.

wedebe avatar Dec 22 '22 23:12 wedebe

@wedebe What ever problems still exist in the background, adding support for more encodings to the plugin is not a workaround. The information in an EIT file can be encoded with different encodings. Even within a file, individual event parts can be encoded differently (eg. name/title in utf-8, short description iso-8859-1 etc.) This is defined in the ETSI EN 300 468.

The EIT parser by emc plugin did not fully evaluate the information about the encoding used. Therefore, the correct encoding could not always be recognized although it could have been. If the encoding was not recognized, the content was subjected to an automatic encoding recognition. However, this also could not always recognize the encoding correctly. Then it happened that some characters were not displayed correctly.

In your case it is a special encoding - freesat huffman. This can also be seen in the EIT file (0x1F 0x02). What you call "garbage characters" is just huffman compressed text. And to bring it to the normal form I added the decoder to the plugin, because python can't easily decode it like utf-8 or other iso encodings.

I just extended the parser so that it can correctly recognize several encodings. No workaround!

pjsharp avatar Dec 23 '22 01:12 pjsharp

Hi, did some recordings.. the first one contains garbage. the 2nd one is fine.

Between the recordings I opened 2nd infobar to display description. have a look at the meta files. the description is slightly different. the 1st one has extra [S] at the end. I copied the epg.dat and saved it again when the recording was working. in the original epg , i can only find the huffman encoded text, in the second one I can find both the encoded and the clear text description. blue bloods.zip

dazulrich avatar Dec 23 '22 01:12 dazulrich

@dazulrich wouldn't I see garbage in the standard oatv movie player, in that case?

maybe the standard movie player has access to the oatv huffman decoder.. (clutching at straws.) it does not usethe meta file.. just tried i the meta file the discription is displaying ok.. the meta file is recreated (but without description). so i wonder whether it has access to the epgcache and pulls the data from there

dazulrich avatar Dec 23 '22 02:12 dazulrich

@dazulrich Thank you very much. I inspected the 2 files. In the first one, the event info is huffman encoded and in the second one with utf-8 (therefore readable). The second file also contains additional information about subtitles etc. The event info is a little different and seems to come from different sources.

I have a guess. For the first recording, the event info can come from the EPGCache, probably older info (e.g. from the nightly EPG refresh) and is huffman coded. This is written to the EIT file at the start of the first recording. Meanwhile, the EPG is updated from the current stream, is utf-8 encoded and contains additional information such as subtitles etc. This updates the EPGCache. The updated information is already used for the second recording and written to the 2nd EIT file.

Both EIT files have the correct format, regardless of whether they are coded in UTF-8 or Huffman. And no matter what file the recorder creates, EMC must be able to decode and display that information correctly from both. And that's the case now, hence this extension and not a workaround.

pjsharp avatar Dec 23 '22 12:12 pjsharp

Thanks @pjsharp. I may have one or two other odd samples that might be good to have analysed as.part.of your work. Instead of having whole bunch of encoded chars, it's only 1 or 2 at the end. And one eit actually crashes e2.

dazulrich avatar Dec 23 '22 13:12 dazulrich

@dazulrich Please attach the files here. Thank you!

pjsharp avatar Dec 23 '22 13:12 pjsharp

@pjsharp, here you go. with your new code those all display ok!!! no crash and no extra chars! they probably are nothing special.. but with the original code they were not ok.

the crash is similar to Tyri0n's , but differnt byte.. log extract inside the zip crash.zip extra_chars.zip

dazulrich avatar Dec 23 '22 20:12 dazulrich

@dazulrich I analyzed your files. The errors with original code were caused by:

  • using the wrong length when parsing (therefore more characters than necessary)
  • incorrect handling of the "default encoding", which could sometimes lead to a crash (if illegal characters occurred)

All these issues are already solved in my code (PR).

When I read your post:

And one eit actually crashes e2.

I initially thought that the crash happened with my code. :)

pjsharp avatar Dec 24 '22 02:12 pjsharp

I initially thought that the crash happened with my code. :)

So did I. But then I realized I had reverted the code to find the garbage chars.... Sorry for the confusion. Happy Christmas...

dazulrich avatar Dec 24 '22 09:12 dazulrich

Thanks. Merry Christmas!

pjsharp avatar Dec 24 '22 10:12 pjsharp