pdftotree
pdftotree copied to clipboard
:evergreen_tree: A tool for converting PDF into hOCR with text, tables, and figures being recognized and preserved.
## Description of the problems or issues **Is your pull request related to a problem? Please describe.** During `pdftotree` installation I get this error: ``` [16:00:54 /tmp]$ python3.11 -m venv...
Pip has started to refuse to install sklearn sometimes. I believe the dependecy should be updated to scikit-learn? ``` Step #0 - "Build": Collecting sklearn==0.0.post1 Step #0 - "Build": Downloading...
## Description of the problems or issues A lot of Errors while importing. **Is your pull request related to a problem? Please describe.** A clear and concise description of what...
Attempting to use `model_type=vision` breaks due to outdated imports with error: ``` ImportError: cannot import name 'img_to_array' from 'keras.preprocessing.image' ``` in line 7 of `pdftotree/pdftotree/visual/visual_utils.py`, likely because these imports have...
**Describe the bug** A clear and concise description of what the bug is. I'm getting the following stack trace error when running pdftotree on a PDF that contains scientific chemical...
**Describe the bug** the first page and the second page of the ouput contain the same text. page 4 and 5 are the same thing as well. **To Reproduce** Steps...
Traceback (most recent call last): File "C:\dirsearch\dirsearch.py", line 27, in from lib.core.argument_parser import ArgumentParser File "C:\dirsearch\lib\core\argument_parser.py", line 24, in from lib.parse.headers import HeadersParser File "C:\dirsearch\lib\parse\headers.py", line 23, in from lib.utils.fmt...
I tried to run demo from library documentation (https://pypi.org/project/pdftotree/) ``` import pdftotree import pathlib pdf_file = pathlib.Path.cwd() / "test.pdf" pdftotree.parse(pdf_file, html_path=None, model_type=None, model_path=None, favor_figures=True, visualize=False) ``` Here is bug i...
``` pdftotree-0.5.0/tests> ls __init__.py test_basic.py test_table_detection.py ``` As a reult, running the tests from the PyPI sdist isnt currently possible. The usual/old solution is to create a MANIFEST.in , which...
While looking further into #114, I found that some of the docs in `tests/input` are copyrighted and non-free, and thus should not be included in GitHub unless there is some...