marker icon indicating copy to clipboard operation
marker copied to clipboard

TypeError: can only concatenate list (not "NoneType") to list

Open rjrobben opened this issue 8 months ago • 1 comments

Describe the bug

When processing a PDF using marker_single, a TypeError occurs during the line merging process.

Traceback

Traceback (most recent call last):
  File "/Users/xxxxx/.local/bin/marker_single", line 8, in <module>
    sys.exit(convert_single_cli())
             ^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/click/core.py", line 1082, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/marker/scripts/convert_single.py", line 35, in convert_single_cli
    rendered = converter(fpath)
               ^^^^^^^^^^^^^^^^
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/marker/converters/pdf.py", line 154, in __call__
    document = self.build_document(filepath)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/marker/converters/pdf.py", line 149, in build_document
    processor(document)
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/marker/processors/line_merge.py", line 130, in __call__
    self.merge_lines(lines, block)
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/marker/processors/line_merge.py", line 104, in merge_lines
    line.merge(other_line)
  File "/Users/xxxxx/.local/pipx/venvs/marker-pdf/lib/python3.12/site-packages/marker/schema/text/line.py", line 99, in merge
    self.structure = self.structure + other.structure
                     ~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
TypeError: can only concatenate list (not "NoneType") to list

Cause

The error occurs in the merge method of the Line class (marker/schema/text/line.py). The line self.structure = self.structure + other.structure attempts to concatenate the structure attributes directly. If either self.structure or other.structure is None, this results in the observed TypeError.

Proposed Fix

Modify the merge method to handle potential None values by treating them as empty lists before concatenation:

    def merge(self, other: "Line"):
        self.polygon = self.polygon.merge([other.polygon])
        # Handle potential None values for structure
        self_structure = self.structure if self.structure is not None else []
        other_structure = other.structure if other.structure is not None else []
        self.structure = self_structure + other_structure
        if self.formats is None:
            self.formats = other.formats
        elif other.formats is not None:
            self.formats = list(set(self.formats + other.formats))

I am not sure whether the fix is acceptable for the original intended purpose of merge.

Environment (if relevant)

  • marker-pdf version: (Please add the version you are using)
  • Python version: 3.12
  • OS: macOS Sonoma

Additional context

This error was encountered while processing a microsoft word converted pdf, the documents are quite dense with text.

rjrobben avatar Apr 02 '25 09:04 rjrobben

Same issue with some PDFs: "error": "Marker failed: 2025-04-04 15:45:58.866704: I tensorflow/core/util/port.cc:153] oneDNN custom operations are on. self.structure = self.structure + other.structure\nTypeError: can only concatenate list (not \"NoneType\") to list\n",

ramirolc02 avatar Apr 06 '25 10:04 ramirolc02

Same issue occurred while parsing this PDF document. https://www.indiabudget.gov.in/doc/eb/allsbe.pdf

arunpkm avatar May 20 '25 10:05 arunpkm

@rjrobben @VikParuchuri Is this issue fixed by this commit ? https://github.com/VikParuchuri/marker/commit/c6dae45c76b389b10a109eac27fe461cacce913d

arunpkm avatar May 20 '25 11:05 arunpkm

Yes, this should be fixed now

VikParuchuri avatar May 20 '25 13:05 VikParuchuri