unstructured
unstructured copied to clipboard
bug/unstructured-ingest-hanging
Describe the bug
Running the unstructured-ingest
cli command and it is hanging. I think that it is treating the root page as a Page Block and trying to parse it, at which point it hangs.
To Reproduce We are investigating this and will update with details.
The best we have to offer for now is that we're running recursive mode and providing a page ID that is the child of a database, where the database's parent is the workspace.
Expected behavior The ingest should throw an error, or run to completion.
Screenshots No screenshots or logs available.
Environment Info We're using the Docker Container:
docker image ls | grep unstructured
downloads.unstructured.io/unstructured-io/unstructured latest 104a18d9e603 3 days ago 8.17GB
I couldn't find the script in the container, but I copied it in and executed it. A few dependency errors but otherwise looks like it collected the info you need here
python3 collect.py
/home/notebook-user/collect.py:5: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import pkg_resources
OS version: Linux-6.4.16-linuxkit-aarch64-with-glibc2.34
Python version: 3.10.13
unstructured version: 0.12.5
unstructured-inference version: 0.7.23
pytesseract version: 0.3.10
Torch version: 2.2.0
Detectron2 is not installed
[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: pip install --upgrade pip
[notice] A new release of pip is available: 23.2.1 -> 24.0
[notice] To update, run: pip install --upgrade pip
PaddleOCR is not installed
Traceback (most recent call last):
File "/home/notebook-user/collect.py", line 242, in <module>
main()
File "/home/notebook-user/collect.py", line 224, in main
libmagic_version = get_libmagic_version()
File "/home/notebook-user/collect.py", line 146, in get_libmagic_version
result = subprocess.run(
File "/usr/local/lib/python3.10/subprocess.py", line 503, in run
with Popen(*popenargs, **kwargs) as process:
File "/usr/local/lib/python3.10/subprocess.py", line 971, in __init__
self._execute_child(args, executable, preexec_fn, close_fds,
File "/usr/local/lib/python3.10/subprocess.py", line 1863, in _execute_child
raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'file'
Additional context Add any other context about the problem here.