docling
docling copied to clipboard
supporting footnotes
Requested feature
@dil-mhajiabadi We do support footnotes. What are you looking for specifically?
page_37.pdf I testes this pdf with Docling pipeline to extract text parts of the pdf and not extract tables. It didn't work, I tested some other examples and it consistently failed to extract footnotes.
def docling_extract_text(input_doc_path, do_ocr):
pipeline_options = PdfPipelineOptions()
pipeline_options.do_ocr = False
pipeline_options.do_table_structure = False
pipeline_options.table_structure_options.do_cell_matching = False
# Any of the OCR options can be used:EasyOcrOptions, TesseractOcrOptions, TesseractCliOcrOptions, OcrMacOptions(Mac only)
ocr_options = EasyOcrOptions()
# ocr_options = TesseractOcrOptions()
# ocr_options = OcrMacOptions()
pipeline_options.ocr_options = ocr_options
converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(
pipeline_options=pipeline_options,
)
}
)
doc = converter.convert(input_doc_path).document
md = doc.export_to_markdown()
return md
@cau-git I am trying to process a pdf and do not see any footnotes, what can be of an issue? Link to the paper with pdf (free). Table 1 and Table 2 are good examples.