docling icon indicating copy to clipboard operation
docling copied to clipboard

TableStructureModel initialization fails: "Cannot copy out of meta tensor" when using CPU device

Open rafaelghiorzi opened this issue 7 months ago • 10 comments

Bug

When processing PDFs with table structure extraction enabled, TableStructureModel initialization fails with the error: "Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device."

This happens specifically in the TFPredictor when trying to load the TableFormer model. The error occurs when using CPU as the accelerator device, preventing successful processing of documents with tables.

Steps to reproduce

  1. Create a DocumentConverter with table structure extraction enabled
  2. Set accelerator device to 'CPU'
  3. Try to process a PDF document containing tables
  4. The process fails with the PyTorch error about meta tensors

Sample code to reproduce:

from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import AcceleratorDevice, AcceleratorOptions, PdfPipelineOptions

# Configure options with table structure enabled
pipeline_options = PdfPipelineOptions()
pipeline_options.do_ocr = True
pipeline_options.do_table_structure = True # The error is very much here
pipeline_options.table_structure_options.do_cell_matching = True
pipeline_options.accelerator_options = AcceleratorOptions(num_threads=4, device=AcceleratorDevice.CPU)

# Create converter
converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
    }
)

# Try to process a PDF - this will fail
result = converter.convert("path/to/pdf_with_tables.pdf")

Docling version

docling 2.30.0 pytorch 2.7.0

Python version

Python 3.12.7

Full Error Message

ERROR - Error processing Aula_Kanban in Practice.pdf: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.
Traceback (most recent call last):
  File "/home/rafael-dias/anaconda3/envs/ipea/lib/python3.12/site-packages/docling_ibm_models/tableformer/data_management/tf_predictor.py", line 178, in _load_model
    model = TableModel04_rs(self._config, self._init_data, self._device)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rafael-dias/anaconda3/envs/ipea/lib/python3.12/site-packages/docling_ibm_models/tableformer/models/table04_rs/tablemodel04_rs.py", line 40, in __init__
    self._encoder = Encoder04(self._enc_image_size, self._encoder_dim).to(device)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/rafael-dias/anaconda3/envs/ipea/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1355, in to
    return self._apply(convert)
           ^^^^^^^^^^^^^^^^^^^^
[...]
  File "/home/rafael-dias/anaconda3/envs/ipea/lib/python3.12/site-packages/torch/nn/modules/module.py", line 942, in _apply
    param_applied = fn(param)
                    ^^^^^^^^^
  File "/home/rafael-dias/anaconda3/envs/ipea/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1348, in convert
    raise NotImplementedError(
NotImplementedError: Cannot copy out of meta tensor; no data! Please use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when moving module from meta to a different device.

Environment Information

  • OS: Linux (Tested on windows and got the same error)
  • Using PyTorch with CPU (CUDA not available or not properly configured)
  • Docling is using newer PyTorch meta device features but seems to have compatibility issues when initializing models on CPU

Suggested Fix

The TableModel04_rs class in docling_ibm_models/tableformer/models/table04_rs/tablemodel04_rs.py needs to be updated to use torch.nn.Module.to_empty() instead of torch.nn.Module.to() when initializing the model on CPU devices.

rafaelghiorzi avatar Apr 23 '25 18:04 rafaelghiorzi

Just discovered some compatibility error with threading ThreadPoolExecutor, but dont know why. I changed to multiprocessing ProcessPoolExecutor and it started working way better

rafaelghiorzi avatar Apr 23 '25 18:04 rafaelghiorzi

I’m running into the same error with v2.30.0. Can this be resolved?

martin-liu avatar Apr 24 '25 21:04 martin-liu

I don't really know how I got rid of this error. Didn't reinstall anything. As I said, moving to multiprocessing resolved my problems. You can check my package codebase here to look for any key differences, and come back if necessary!

rafaelghiorzi avatar Apr 24 '25 23:04 rafaelghiorzi

I'm hitting the same issue on v2.31.0.

fabianofranz avatar Apr 30 '25 13:04 fabianofranz

@martin-liu were you able to resolve the problem? I'm not running into this issue anymore.

@fabianofranz you also check my code to search for differences!

rafaelghiorzi avatar Apr 30 '25 17:04 rafaelghiorzi

+1 Running into this.

We run Docling inside a GRPC server which requires a ThreadPoolExecutor, so moving to ProcessPoolExecutor is not an option (at least not a straightforward one)

Image

kiratp avatar May 07 '25 00:05 kiratp

https://github.com/docling-project/docling-serve/issues/175

They were trying to figure out this issue here!

rafaelghiorzi avatar May 07 '25 00:05 rafaelghiorzi

@rafaelghiorzi What exactly did you do, I am facing the same issue, only when doing parallell calling of docling

vishaldasnewtide avatar May 23 '25 09:05 vishaldasnewtide

@vishaldasnewtide I don't really remember now. Switched back to multiprocessing and tried parallel processing on GPU. I later realized through my research that threading or multiprocessing on CUDA device is not a good practice and decided to not use it anymore. I encoutered this error while adding picture description, picture analysis and force ocr on my converter preferecences, and don't really know why I started working again. You can check the code in my forked pdfplucker repostory to see if it matches you implementation, or look for the people that also had this problem, but probably solved it.

rafaelghiorzi avatar May 26 '25 11:05 rafaelghiorzi

@rafaelghiorzi Well, I have a requirement of doing RAG, starting with picking up messages from queues(Azure bus), the process is Async with threading, for each message it calls document processing which requires async call to docling

vishaldasnewtide avatar May 26 '25 15:05 vishaldasnewtide