docling icon indicating copy to clipboard operation
docling copied to clipboard

supporting footnotes

Open dil-mhajiabadi opened this issue 1 year ago • 3 comments

Requested feature

dil-mhajiabadi avatar Nov 25 '24 22:11 dil-mhajiabadi

@dil-mhajiabadi We do support footnotes. What are you looking for specifically?

cau-git avatar Nov 26 '24 13:11 cau-git

page_37.pdf I testes this pdf with Docling pipeline to extract text parts of the pdf and not extract tables. It didn't work, I tested some other examples and it consistently failed to extract footnotes.


def docling_extract_text(input_doc_path, do_ocr):
    pipeline_options = PdfPipelineOptions()
    pipeline_options.do_ocr = False
    pipeline_options.do_table_structure = False
    pipeline_options.table_structure_options.do_cell_matching = False
    
    # Any of the OCR options can be used:EasyOcrOptions, TesseractOcrOptions, TesseractCliOcrOptions, OcrMacOptions(Mac only)
    ocr_options = EasyOcrOptions()
    # ocr_options = TesseractOcrOptions()
    # ocr_options = OcrMacOptions()
    pipeline_options.ocr_options = ocr_options
    
    converter = DocumentConverter(
        format_options={
            InputFormat.PDF: PdfFormatOption(
                pipeline_options=pipeline_options,
            )
        }
    )
    doc = converter.convert(input_doc_path).document
    md = doc.export_to_markdown()
    return md

dil-mhajiabadi avatar Nov 27 '24 22:11 dil-mhajiabadi

@cau-git I am trying to process a pdf and do not see any footnotes, what can be of an issue? Link to the paper with pdf (free). Table 1 and Table 2 are good examples.

chupvl avatar Feb 05 '25 22:02 chupvl