Unable to process idx file without palette definition
I have an idx file which I extracted with mkvextract from an mkv I ripped from a DVD, the process is something like this:
mplayer -dvd-device $dvd dvd://$nav -dumpstream -dumpfile $vob
ffmpeg \
-fflags +genpts \
-analyzeduration 1000000k \
-probesize 1000000k \
-i $vob \
-c copy \
$mapping \
$metadata_audio \
$metadata_subs \
-y \
$mkv
mkvextract $mkv tracks id:idxfile
This generates .idx files which look like this:
# VobSub index file, v7 (do not modify this line!)
langidx: 0
id: en, index: 0
timestamp: 00:00:04:200, filepos: 000000000
# etc etc
For vobsub2srt I added the line custom colors: ON, tridx: 1000, colors: 000000, ffffff, 000000, 000000 to improve on the OCR bit, but it's been a bit of a hit and miss with that.
So my idx files look like this:
# VobSub index file, v7 (do not modify this line!)
langidx: 0
custom colors: ON, tridx: 1000, colors: 000000, ffffff, 000000, 000000
id: en, index: 0
timestamp: 00:00:04:200, filepos: 000000000
# etc etc
However vobsubocr croaks with errors on these idx files:
An error occured: Could not parse VOB subtitles from 13-dut.idx: Could not parse 13-dut.idx
It seems you want to have palette present (I snipped piece of a idx I saw in one of the bugreports here) for the tool to work:
# The palette of the generated file
palette: 000000, f0f0f0, cccccc, 999999, 3333fa, 1111bb, fa3333, bb1111, 33fa33, 11bb11, fafa33, bbbb11, fa33fa, bb11bb, 33fafa, 11bbbb
This is what you seem to be able to process:
# VobSub index file, v7 (do not modify this line!)
langidx: 0
palette: 000000, f0f0f0, cccccc, 999999, 3333fa, 1111bb, fa3333, bb1111, 33fa33, 11bb11, fafa33, bbbb11, fa33fa, bb11bb, 33fafa, 11bbbb
id: en, index: 0
timestamp: 00:00:04:200, filepos: 000000000
# etc etc
I don't think we need to have the palette (or custom colors for that matter) present. It would be best if we generate an image that tesseract likes best:
- black on white
- add a border around the image/text (https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html#borders)