Edouard Belval
Edouard Belval
To add to this, there is a bug with Numpy on Windows right now for which the advised fix it to downgrade numpy to 1.19.3. Unfortunately, that does not fix...
Did you find a way to ignore these invisible characters? I am having a similar issue on my side.
The way the code is written, both parameters (dpi and size) are being sent to `pdftoppm` this means that the issue you are seeing is most likely not at the...
Is this only happening with a single PDF? If you run `pdftoppm -r 200 -jpeg your_file.pdf out` does it show any warnings?
If you still have this issue, FreeTDS by default will truncate data. In your `/etc/freetds/freetds.conf` add `text size = 1000000000` (Or any value you see fit). EDIT: I just saw...
Clipping path is indeed used to hide text in PDF documents, here is an example that could be used as a starting point: https://mva.maryland.gov/Documents/VR-181.pdf There is hidden text slightly above...
Use `-fd` instead of `-ft`. The former is for a directory while the latter is for a single font file. The missing handwritten module message is because you do not...
You define the max height with `-f 64`. When skewing the text, TDRG will respect the max height given so the width and text size is reduced.
I already implemented the first option of outputting a mask with a different pixel value for each characters. It's not ready to merge yet, because of tests and handling the...
Branch is https://github.com/Belval/TextRecognitionDataGenerator/tree/output-mask