hayhooks
hayhooks copied to clipboard
Issue with Unstructured Document Converter (Related to Asyncio)
I'm testing a pipeline that utilizes the Unstructured Converter component for processing PDFs. The pipeline works locally, but fails with Hayhooks.
The error is as follows:
2024-06-07 14:41:27
Converting files to Haystack Documents: 0it [00:00, ?it/s]Unstructured could not process file /data/file.pdf. Error: Traceback (most recent call last):
2024-06-07 14:41:27 File "/opt/venv/lib/python3.12/site-packages/haystack_integrations/components/converters/unstructured/converter.py", line 198, in _partition_file_into_elements
2024-06-07 14:41:27 elements = partition_via_api(
2024-06-07 14:41:27 ^^^^^^^^^^^^^^^^^^
2024-06-07 14:41:27 File "/opt/venv/lib/python3.12/site-packages/unstructured/partition/api.py", line 70, in partition_via_api
2024-06-07 14:41:27 sdk = UnstructuredClient(api_key_auth=api_key, server_url=base_url)
2024-06-07 14:41:27 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2024-06-07 14:41:27 File "/opt/venv/lib/python3.12/site-packages/unstructured_client/sdk.py", line 54, in __init__
2024-06-07 14:41:27 self.sdk_configuration = SDKConfiguration(
2024-06-07 14:41:27 ^^^^^^^^^^^^^^^^^
2024-06-07 14:41:27 File "<string>", line 13, in __init__
2024-06-07 14:41:27 File "/opt/venv/lib/python3.12/site-packages/unstructured_client/sdkconfiguration.py", line 38, in __post_init__
2024-06-07 14:41:27 self._hooks = SDKHooks()
2024-06-07 14:41:27 ^^^^^^^^^^
2024-06-07 14:41:27 File "/opt/venv/lib/python3.12/site-packages/unstructured_client/_hooks/sdkhooks.py", line 15, in __init__
2024-06-07 14:41:27 init_hooks(self)
2024-06-07 14:41:27 File "/opt/venv/lib/python3.12/site-packages/unstructured_client/_hooks/registration.py", line 28, in init_hooks
2024-06-07 14:41:27 split_pdf_hook = SplitPdfHook()
2024-06-07 14:41:27 ^^^^^^^^^^^^^^
2024-06-07 14:41:27 File "/opt/venv/lib/python3.12/site-packages/unstructured_client/_hooks/custom/split_pdf_hook.py", line 73, in __init__
2024-06-07 14:41:27 nest_asyncio.apply()
2024-06-07 14:41:27 File "/opt/venv/lib/python3.12/site-packages/nest_asyncio.py", line 19, in apply
2024-06-07 14:41:27 _patch_loop(loop)
2024-06-07 14:41:27 File "/opt/venv/lib/python3.12/site-packages/nest_asyncio.py", line 193, in _patch_loop
2024-06-07 14:41:27 raise ValueError('Can\'t patch loop of type %s' % type(loop))
2024-06-07 14:41:27 ValueError: Can't patch loop of type <class 'uvloop.Loop'>
The line of code that causes this to fail is nest_asyncio.apply()
.
After some research, I fixed the issue for myself by updating the cli code to the following. I not an expert here and wanted to know if this approach is fine?
import click
import uvicorn
import os
import sys
import asyncio
@click.command()
@click.option('--host', default="localhost")
@click.option('--port', default=1416)
@click.option('--pipelines-dir', default=os.environ.get("HAYHOOKS_PIPELINES_DIR"))
@click.option('--additional-python-path', default=os.environ.get("HAYHOOKS_ADDITIONAL_PYTHONPATH"))
def run(host, port, pipelines_dir, additional_python_path):
if not pipelines_dir:
pipelines_dir = "pipelines.d"
os.environ["HAYHOOKS_PIPELINES_DIR"] = pipelines_dir
if additional_python_path:
sys.path.append(additional_python_path)
loop = asyncio.new_event_loop()
config = uvicorn.Config("hayhooks.server:app", host=host, port=port, loop=loop)
server = uvicorn.Server(config)
loop.run_until_complete(server.serve())