DAVAR-Lab-OCR icon indicating copy to clipboard operation
DAVAR-Lab-OCR copied to clipboard

VSR Layout Recognition Test Issues

Open Schneipi opened this issue 1 year ago • 0 comments

I have the same Exception has occurred: ValueError too many values to unpack (expected 4) issue as https://github.com/hikopensource/DAVAR-Lab-OCR/issues/109, which is indeed being solved through the addition of

img = img[0]
gt_bboxes = gt_bboxes[0]

as the first lines in forward() in bertgrid_embedding.py.

Next, there seems to be a numpy problem, which requires line (same file) w_start, h_start, w_end, h_end = gt_bboxes_arr[iter_b_l].round().astype(np.int).tolist() to be changed to w_start, h_start, w_end, h_end = gt_bboxes_arr[iter_b_l].round().astype(np.int64).tolist() to not raise an AttributeError.

Having done those two modifications, PyTorch now complains about:

File "/opt/homebrew/Caskroom/miniforge/base/lib/python3.9/site-packages/torch/nn/modules/conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Expected 3D (unbatched) or 4D (batched) input to conv2d, but got input of size: [1, 1, 3, 800, 608]

Perhaps the suggestion in https://github.com/hikopensource/DAVAR-Lab-OCR/issues/109 is not addressing everything after all, or even breaks something? Or maybe there is a compatibility issue with the used versions of torch==1.13.1 and numpy==1.24.2? I couldn't find any info on the expected versions (except for the lower bounds) for the DAVAR-LAB-OCR project.

I'm trying to run DAVAR-Lab-OCR/demo/text_layout/VSR/DocBank/test.sh and have correctly prepared the models and adjusted config/docbank_x101.

Any suggestions? Thanks.

Schneipi avatar Mar 01 '23 13:03 Schneipi