marker icon indicating copy to clipboard operation
marker copied to clipboard

Player's Handbook D&D 5th Edition table parsing issues

Open krainboltgreene opened this issue 1 year ago • 1 comments

So I pushed up an OCR'd copy of the PHB and did the first ten pages and got https://gist.github.com/krainboltgreene/48712b8947e20b4594259f90087ae181

Now a few things: Some of these issues are from the OCR of the PDF itself, but I feel like some may be an issue with marker?

krainboltgreene avatar Dec 01 '23 20:12 krainboltgreene

Some OCR engines annoyingly put spaces between characters. I think it's due to their expected character spacing heuristics. I suspect that is what is happening. I'm going to try to train an OCR model that doesn't do this in the next couple of months.

Did you try it with the postprocessor model enabled? (set ENABLE_EDITOR_MODEL). That might improve things.

Can you share the source pdf?

VikParuchuri avatar Dec 05 '23 21:12 VikParuchuri