doctr
doctr copied to clipboard
Flipped text recognition prediction.
Bug description
When I set the option assume_straight_pages=False, some of the predictions may be turned upside down. I tried db_resnet34, db_resnet50 and master, parseg. For each pair I observed this bug.
Code snippet to reproduce the bug
from doctr.models import ocr_predictor
from doctr.io import DocumentFile
input = DocumentFile.from_images("./gh.png")
model = ocr_predictor(
'db_resnet50',
'parseq',
pretrained=True,
assume_straight_pages=False,
).cuda().half()
result = model(input)
print(result)
Error traceback
...
Line(
(words): [
Word(value='ster', confidence=1.0),
Word(value='and', confidence=1.0),
Word(value='Graham', confidence=1.0),
Word(value='6661]', confidence=0.95), <-- Should be '[1999'
Word(value='and', confidence=1.0),
Word(value='2012],', confidence=1.0),
Word(value='Gamba', confidence=1.0),
Word(value='and', confidence=1.0),
Word(value='Graham', confidence=1.0),
Word(value='[2018]', confidence=1.0),
Word(value='and', confidence=1.0),
Word(value='Axelrod', confidence=1.0),
Word(value='[2018).', confidence=0.99),
]
),
...
Environment
Collecting environment information...
DocTR version: 0.8.0a0 TensorFlow version: N/A PyTorch version: 2.1.0a0+4136153 (torchvision 0.16.0a0) OpenCV version: 4.9.0 OS: Ubuntu 22.04.2 LTS Python version: 3.10.6 Is CUDA available (TensorFlow): N/A Is CUDA available (PyTorch): Yes CUDA runtime version: 12.1.105 GPU models and configuration: GPU 0: NVIDIA A30 Nvidia driver version: 525.147.05 cuDNN version: Probably one of the following: /usr/lib/x86_64-linux-gnu/libcudnn.so.8.9.2 /usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.9.2 /usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.9.2 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.9.2 /usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.9.2 /usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.9.2 /usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.9.2
Deep Learning backend
is_tf_available: False is_torch_available: True
Hi @decadance-dance :wave:
Yeah this depends on the crop orientation classifier which isn't 100% robust atm. We will retrain this after the next release a training script is already added :)
CC @odulcy-mindee
@felixdittrich92, got you, thanks. BTW, maybe you know an easy way to workaround it in my case. My case is I want to get quads (4 pts) instead of rectangles (2 pts) as input of a detector, even if my page is straight. That is, in a real scenario, I will receive straight documents and I don’t really need to get their orientation and rotate them, but I still need rectification to feed crops to the recognizer.
Mh could you explain this a bit more in detail ? Because if your images contains only straight text the rectification should not be a problem !?
If we talk about some modifications from the detector output in the middle of the pipeline before it's passed to the recognition model -> https://github.com/mindee/doctr/pull/1449 could be a helpful solver (Note: input and output signature needs to be the same so conversion from rect to quad in the same pipeline will not work
@felixdittrich92 All my documents are straight. So I could use assume_straight_pages = True
, but in that case I would get rectangles (two points) as the detector output. But I need to get quads (four points) from the detector, so I use assume_straight_pages = False
. But this option sometimes causes problems, such as those described in this issue.
So I'm looking for a way to get four points from detector and avoid the upside down crops.
@felixdittrich92 Hi, I'm facing similar issues with v0.8.1 when operating on text that is rotated upto +/- 45 degrees. I see the issue mentions v0.9.0 and v0.10.0. Is there a way I can test the new model/checkpoint ? PR #1608 has a new TF checkpoint, but I'm using PyTorch