OCR_tablenet icon indicating copy to clipboard operation
OCR_tablenet copied to clipboard

ValueError: operands could not be broadcast together with shapes (896,896,4) (3,) (896,896,4)

Open JKYang01 opened this issue 1 year ago • 1 comments

i am testing the model with different cases and something strange is, if I just feed it a screenshot .png picture of a table (sample attached) sample_table2 and I run the command python predict.py --model_weights='./tablenet_pretrained.ckpt' --image_path='./sample_table2.png'

it gives me value error operands could not be broadcast together with shapes (896,896,4) (3,) (896,896,4)

here is the full error log:

  File "/home/bluespinach/Documents/projects/tool_exp/OCR_tablenet/predict.py", line 148, in <module>
    predict()
  File "/home/bluespinach/miniconda3/envs/tablenet/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/home/bluespinach/miniconda3/envs/tablenet/lib/python3.9/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
  File "/home/bluespinach/miniconda3/envs/tablenet/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/home/bluespinach/miniconda3/envs/tablenet/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/home/bluespinach/Documents/projects/tool_exp/OCR_tablenet/predict.py", line 144, in predict
    print(pred.predict(image))
  File "/home/bluespinach/Documents/projects/tool_exp/OCR_tablenet/predict.py", line 50, in predict
    processed_image = self.transforms(image=np.array(image))["image"]
  File "/home/bluespinach/miniconda3/envs/tablenet/lib/python3.9/site-packages/albumentations/core/composition.py", line 182, in __call__
    data = t(force_apply=force_apply, **data)
  File "/home/bluespinach/miniconda3/envs/tablenet/lib/python3.9/site-packages/albumentations/core/transforms_interface.py", line 89, in __call__
    return self.apply_with_params(params, **kwargs)
  File "/home/bluespinach/miniconda3/envs/tablenet/lib/python3.9/site-packages/albumentations/core/transforms_interface.py", line 102, in apply_with_params
    res[key] = target_function(arg, **dict(params, **target_dependencies))
  File "/home/bluespinach/miniconda3/envs/tablenet/lib/python3.9/site-packages/albumentations/augmentations/transforms.py", line 1496, in apply
    return F.normalize(image, self.mean, self.std, self.max_pixel_value)
  File "/home/bluespinach/miniconda3/envs/tablenet/lib/python3.9/site-packages/albumentations/augmentations/functional.py", line 141, in normalize
    img -= mean
ValueError: operands could not be broadcast together with shapes (896,896,4) (3,) (896,896,4) 

something strange is, if I input the .png file which is transferred from a pdf page, it works well. here is my testing sample, the pdf page output_page_0

could you help to address why the smaller table screenshot picture doesn't work?

Thank you very much!

Best regards, JKyang01

JKYang01 avatar Feb 18 '24 23:02 JKYang01

some follow ups: I tried another way to crop the table. I transferred the pdf page into a .png file and save the table as a crop image of the pdf image. the output picture is like this: test3_table_2 and I tried the detection again: python predict.py --model_weights='./tablenet_pretrained.ckpt' --image_path='./sample_table/test3_table_2.png'

it doesn't give me the error, but feed back with an empty list. []

That means the model cannot detect a table on the picture if the picture is the table (lol). I think probably the model is more use to see a table in side of a context (according to the training data).

Another thing I found is, if the pdf page have tables that connects to each other, the model will struggle to identify them as different tables, it will read those tables . and in data I am doing the processing, most of the pdf page have the tables that connects to each other.

JKYang01 avatar Feb 19 '24 19:02 JKYang01