e2openplugin-EnhancedMovieCenter
e2openplugin-EnhancedMovieCenter copied to clipboard
Garbage text shown in recording information
Just started using EMC and half of my recordings' info is displaying garbage. The text is displayed correctly in the default video playlist, and elsewhere before recording.
Looks like the issue is here:
20220113 23:25 [META] Detected short_event encoding-type: windows-1253 (0.330670671343)
20220113 23:25 [META] Exception in readEitFile: 'charmap' codec can't decode byte 0x9f in position 15: character maps to
I think this is caused by EPGCache which provides descriptions for recordings.
But I also think that data from sources outside EMC should be sanitised by EMC upon ingestion.
@Tyri0n Could you please attach the eit file here, if you still have this problem?
Sample attached (if this is indeed the same problem): (see also https://github.com/openatv/enigma2/issues/1954)
Archive-with_movie.zip Archive-without_movie.zip
I went back to the standard movie player soon after. The reply by @wedebe is what I was seeing, if I remember correctly.
A few days ago I opened a pull request that should solve these problems in EMC. It's still waiting to be merged.
@pjsharp, I've manually uploaded the updated files in your PR to my receiver - looks good :) Thanks a mill!
Does this mean that EPGCache wasn't actually at fault or what exactly was going on?
@Tyri0n, I've manually uploaded the updated files in your PR to my receiver - looks good :) Thanks a mill!
Does this mean that EPGCache wasn't actually at fault or what exactly was going on?
guessing this was meant for @pjsharp :)
Sorry Tyri0n, yes it was 🙈
The eit-Parser of EMC did not support some encodings. I've added a few encoding and fixed some parsing errors I noticed.
#298
@wedebe I'm pretty sure that oatv epgcache (or whatever code) is at fault. I have same garbage (also recording 28.2e). I can make 2 short recordings from same program and 1 has garbage chars in eit file and the other is normal... It's great that pjsharp has added the decoding I'm emc, but it will make it less likely that oatv will try fixing the underlying problem..
@dazulrich wouldn't I see garbage in the standard oatv movie player, in that case?
@dazulrich Do you still have these 2 recordings from the same program? Or would it be possible to reproduce it again? I would like to inspect the eit files.
@dazulrich wouldn't I see garbage in the standard oatv movie player, in that case?
Not if the standard oatv movie player has a workaround in it.
It's great that pjsharp has added the decoding I'm emc, but it will make it less likely that oatv will try fixing the underlying problem..
I raised this exact same concern with a workaround that got added to a different plugin.
Although, self-conflictingly... I also believe that any data consumer should never blindly trust an external data provider, so any potential errors should be handled gracefully.
But, then againnnnnn... each individual workaround is another element added to a house of cards lined up for crapping itself if/when the original cause of such a problem is fixed at source 🙈
@dazulrich Do you still have these 2 recordings from the same program? Or would it be possible to reproduce it again? I would like to inspect the eit files.
I'd observed this too, but could never figure out a pattern. I'm sure I've got plenty of samples, just not available to me this week.
@wedebe What ever problems still exist in the background, adding support for more encodings to the plugin is not a workaround. The information in an EIT file can be encoded with different encodings. Even within a file, individual event parts can be encoded differently (eg. name/title in utf-8, short description iso-8859-1 etc.) This is defined in the ETSI EN 300 468.
The EIT parser by emc plugin did not fully evaluate the information about the encoding used. Therefore, the correct encoding could not always be recognized although it could have been. If the encoding was not recognized, the content was subjected to an automatic encoding recognition. However, this also could not always recognize the encoding correctly. Then it happened that some characters were not displayed correctly.
In your case it is a special encoding - freesat huffman. This can also be seen in the EIT file (0x1F 0x02). What you call "garbage characters" is just huffman compressed text. And to bring it to the normal form I added the decoder to the plugin, because python can't easily decode it like utf-8 or other iso encodings.
I just extended the parser so that it can correctly recognize several encodings. No workaround!
Hi, did some recordings.. the first one contains garbage. the 2nd one is fine.
Between the recordings I opened 2nd infobar to display description. have a look at the meta files. the description is slightly different. the 1st one has extra [S] at the end. I copied the epg.dat and saved it again when the recording was working. in the original epg , i can only find the huffman encoded text, in the second one I can find both the encoded and the clear text description. blue bloods.zip
@dazulrich wouldn't I see garbage in the standard oatv movie player, in that case?
maybe the standard movie player has access to the oatv huffman decoder.. (clutching at straws.) it does not usethe meta file.. just tried i the meta file the discription is displaying ok.. the meta file is recreated (but without description). so i wonder whether it has access to the epgcache and pulls the data from there
@dazulrich Thank you very much. I inspected the 2 files. In the first one, the event info is huffman encoded and in the second one with utf-8 (therefore readable). The second file also contains additional information about subtitles etc. The event info is a little different and seems to come from different sources.
I have a guess. For the first recording, the event info can come from the EPGCache, probably older info (e.g. from the nightly EPG refresh) and is huffman coded. This is written to the EIT file at the start of the first recording. Meanwhile, the EPG is updated from the current stream, is utf-8 encoded and contains additional information such as subtitles etc. This updates the EPGCache. The updated information is already used for the second recording and written to the 2nd EIT file.
Both EIT files have the correct format, regardless of whether they are coded in UTF-8 or Huffman. And no matter what file the recorder creates, EMC must be able to decode and display that information correctly from both. And that's the case now, hence this extension and not a workaround.
Thanks @pjsharp. I may have one or two other odd samples that might be good to have analysed as.part.of your work. Instead of having whole bunch of encoded chars, it's only 1 or 2 at the end. And one eit actually crashes e2.
@dazulrich Please attach the files here. Thank you!
@pjsharp, here you go. with your new code those all display ok!!! no crash and no extra chars! they probably are nothing special.. but with the original code they were not ok.
the crash is similar to Tyri0n's , but differnt byte.. log extract inside the zip crash.zip extra_chars.zip
@dazulrich I analyzed your files. The errors with original code were caused by:
- using the wrong length when parsing (therefore more characters than necessary)
- incorrect handling of the "default encoding", which could sometimes lead to a crash (if illegal characters occurred)
All these issues are already solved in my code (PR).
When I read your post:
And one eit actually crashes e2.
I initially thought that the crash happened with my code. :)
I initially thought that the crash happened with my code. :)
So did I. But then I realized I had reverted the code to find the garbage chars.... Sorry for the confusion. Happy Christmas...
Thanks. Merry Christmas!