unstructured icon indicating copy to clipboard operation
unstructured copied to clipboard

bug/infer_table_structure on docker with M1 chip

Open snova-amitk opened this issue 7 months ago • 5 comments

Describe the bug The partition_pdf function errors with segmentation fault when infer_table_structure=True

To Reproduce Follow the docker instructions here: https://unstructured-io.github.io/unstructured/installation/docker.html

from unstructured.partition.pdf import partition_pdf elements = partition_pdf(filename="example-docs/layout-parser-paper-with-Table.pdf", infer_table_structure=True)

Expected behavior Not to segment fault.

Screenshots Downloading yolox_l0.05.onnx: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 217M/217M [00:14<00:00, 14.7MB/s] Downloading (…)lve/main/config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.47k/1.47k [00:00<00:00, 2.07MB/s] Downloading model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 115M/115M [00:07<00:00, 14.9MB/s] Downloading model.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 46.8M/46.8M [00:03<00:00, 15.1MB/s] Some weights of the model checkpoint at microsoft/table-transformer-structure-recognition were not used when initializing TableTransformerForObjectDetection: ['model.backbone.conv_encoder.model.layer3.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer2.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer4.0.downsample.1.num_batches_tracked']

  • This IS expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Segmentation fault

Environment Info Using docker instructions.

snova-amitk avatar Nov 16 '23 00:11 snova-amitk

Hi @snova-amitk , thanks for reporting this bug and we are tracking it. In the meantime if you can provide details of your hardware setup it would be very helpful (what kind of CPUs are they and what kind of instruction sets do they have).

badGarnet avatar Nov 16 '23 15:11 badGarnet

  Model Name: MacBook Pro
  Model Identifier: MacBookPro18,3
  Chip: Apple M1 Pro
  arm64 instruction set

snova-amitk avatar Nov 16 '23 17:11 snova-amitk

@snova-amitk unfortunately we don't support apple ARM chips with docker image at the moment. The combination of different CPU architecture and OS results in incompatibility with the model binary and instruction set of the CPU. This shouldn't be a problem on an x86 CPU. We are tracking this problem but there is not plan for immediate resolution.

badGarnet avatar Nov 16 '23 18:11 badGarnet

I'm seeing the same on ARM for Ubuntu with a pip install (no docker).

My hardware is an NVIDIA IGX Orin Devkit w/ A6000 dGPU.

huvers avatar Dec 07 '23 20:12 huvers

On Mac Pro with an M1 chip, I have a similar issue when infer_table_structure is set to True. I am running rest Api Django server, if infer_table_structure is set to True then it kills my whole server(locally) without throwing any error so I am not sure what exactly the problem is. I am guessing incompatibility with the M1 chip, and ARM architecture in general. I have installed everything requested in: https://unstructured-io.github.io/unstructured/installation/full_installation.html, I am not using docker in this case.

partition_pdf works fine without infer_table_structure set to True.

@badGarnet why this incompatibility with ARM architecture isn't mentioned in the docs? It would save so much time not to doubt everything!

@badGarnet Can you confirm that I can't use infer_table_structure on ARM architecture?

I can confirm that everything worked fine after switching to x86 device.

CROmartin avatar Mar 31 '24 21:03 CROmartin