DAVAR-Lab-OCR
DAVAR-Lab-OCR copied to clipboard
VSR Layout Recognition Test Issues
I have the same Exception has occurred: ValueError too many values to unpack (expected 4)
issue as https://github.com/hikopensource/DAVAR-Lab-OCR/issues/109, which is indeed being solved through the addition of
img = img[0]
gt_bboxes = gt_bboxes[0]
as the first lines in forward()
in bertgrid_embedding.py
.
Next, there seems to be a numpy problem, which requires line (same file)
w_start, h_start, w_end, h_end = gt_bboxes_arr[iter_b_l].round().astype(np.int).tolist()
to be changed to
w_start, h_start, w_end, h_end = gt_bboxes_arr[iter_b_l].round().astype(np.int64).tolist()
to not raise an AttributeError
.
Having done those two modifications, PyTorch now complains about:
File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [1, 1, 3, 800, 608]
Perhaps the suggestion in https://github.com/hikopensource/DAVAR-Lab-OCR/issues/109 is not addressing everything after all, or even breaks something? Or maybe there is a compatibility issue with the used versions of torch==1.13.1 and numpy==1.24.2? I couldn't find any info on the expected versions (except for the lower bounds) for the DAVAR-LAB-OCR project.
I'm trying to run DAVAR-Lab-OCR/demo/text_layout/VSR/DocBank/test.sh
and have correctly prepared the models and adjusted config/docbank_x101
.
Any suggestions? Thanks.