unstructured
unstructured copied to clipboard
bug/infer_table_structure on docker with M1 chip
Describe the bug The partition_pdf function errors with segmentation fault when infer_table_structure=True
To Reproduce Follow the docker instructions here: https://unstructured-io.github.io/unstructured/installation/docker.html
from unstructured.partition.pdf import partition_pdf elements = partition_pdf(filename="example-docs/layout-parser-paper-with-Table.pdf", infer_table_structure=True)
Expected behavior Not to segment fault.
Screenshots Downloading yolox_l0.05.onnx: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 217M/217M [00:14<00:00, 14.7MB/s] Downloading (…)lve/main/config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.47k/1.47k [00:00<00:00, 2.07MB/s] Downloading model.safetensors: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 115M/115M [00:07<00:00, 14.9MB/s] Downloading model.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 46.8M/46.8M [00:03<00:00, 15.1MB/s] Some weights of the model checkpoint at microsoft/table-transformer-structure-recognition were not used when initializing TableTransformerForObjectDetection: ['model.backbone.conv_encoder.model.layer3.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer2.0.downsample.1.num_batches_tracked', 'model.backbone.conv_encoder.model.layer4.0.downsample.1.num_batches_tracked']
- This IS expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing TableTransformerForObjectDetection from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model). Segmentation fault
Environment Info Using docker instructions.
Hi @snova-amitk , thanks for reporting this bug and we are tracking it. In the meantime if you can provide details of your hardware setup it would be very helpful (what kind of CPUs are they and what kind of instruction sets do they have).
Model Name: MacBook Pro
Model Identifier: MacBookPro18,3
Chip: Apple M1 Pro
arm64 instruction set
@snova-amitk unfortunately we don't support apple ARM chips with docker image at the moment. The combination of different CPU architecture and OS results in incompatibility with the model binary and instruction set of the CPU. This shouldn't be a problem on an x86 CPU. We are tracking this problem but there is not plan for immediate resolution.
I'm seeing the same on ARM for Ubuntu with a pip install (no docker).
My hardware is an NVIDIA IGX Orin Devkit w/ A6000 dGPU.
On Mac Pro with an M1 chip, I have a similar issue when infer_table_structure is set to True. I am running rest Api Django server, if infer_table_structure is set to True then it kills my whole server(locally) without throwing any error so I am not sure what exactly the problem is. I am guessing incompatibility with the M1 chip, and ARM architecture in general. I have installed everything requested in: https://unstructured-io.github.io/unstructured/installation/full_installation.html, I am not using docker in this case.
partition_pdf works fine without infer_table_structure set to True.
@badGarnet why this incompatibility with ARM architecture isn't mentioned in the docs? It would save so much time not to doubt everything!
@badGarnet Can you confirm that I can't use infer_table_structure on ARM architecture?
I can confirm that everything worked fine after switching to x86 device.