langchain
langchain copied to clipboard
OnlinePDFLoader crashes with import error on Google Colab
Checked other resources
- [X] I added a very descriptive title to this issue.
- [X] I searched the LangChain documentation with the integrated search.
- [X] I used the GitHub search to find a similar question and didn't find it.
- [X] I am sure that this is a bug in LangChain rather than my code.
- [X] The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
Steps to Replicate:
Requirements.txt
%%writefile requirements.txt
replicate
langchain
langchain-community
sentence-transformers
pdf2image
pdfminer
pdfminer.six
unstructured
faiss-gpu
uvicorn
ctransformers
python-box
streamlit
Installing on colab
!pip install -r requirements.txt
Code I am trying to run
# Load the external data source
from langchain.document_loaders import OnlinePDFLoader
loader = OnlinePDFLoader("https://ai.meta.com/static-resource/responsible-use-guide/")
documents = loader.load()
Error Message and Stack Trace (if applicable)
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
[<ipython-input-90-759c82deb3bb>](https://localhost:8080/#) in <cell line: 4>()
2 from langchain_community.document_loaders import OnlinePDFLoader
3 loader = OnlinePDFLoader("https://ai.meta.com/static-resource/responsible-use-guide/")
----> 4 documents = loader.load()
5
6 # Step 2: Get text splits from Document
4 frames
[/usr/local/lib/python3.10/dist-packages/langchain_community/document_loaders/pdf.py](https://localhost:8080/#) in load(self)
157 """Load documents."""
158 loader = UnstructuredPDFLoader(str(self.file_path))
--> 159 return loader.load()
160
161
[/usr/local/lib/python3.10/dist-packages/langchain_core/document_loaders/base.py](https://localhost:8080/#) in load(self)
27 def load(self) -> List[Document]:
28 """Load data into Document objects."""
---> 29 return list(self.lazy_load())
30
31 async def aload(self) -> List[Document]:
[/usr/local/lib/python3.10/dist-packages/langchain_community/document_loaders/unstructured.py](https://localhost:8080/#) in lazy_load(self)
86 def lazy_load(self) -> Iterator[Document]:
87 """Load file."""
---> 88 elements = self._get_elements()
89 self._post_process_elements(elements)
90 if self.mode == "elements":
[/usr/local/lib/python3.10/dist-packages/langchain_community/document_loaders/pdf.py](https://localhost:8080/#) in _get_elements(self)
69
70 def _get_elements(self) -> List:
---> 71 from unstructured.partition.pdf import partition_pdf
72
73 return partition_pdf(filename=self.file_path, **self.unstructured_kwargs)
[/usr/local/lib/python3.10/dist-packages/unstructured/partition/pdf.py](https://localhost:8080/#) in <module>
36 from pdfminer.utils import open_filename
37 from PIL import Image as PILImage
---> 38 from pillow_heif import register_heif_opener
39
40 from unstructured.chunking import add_chunking_strategy
ModuleNotFoundError: No module named 'pillow_heif'
---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.
To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------
Description
- I am trying to use langchain on my google colab notebook to load a pdf.
- Expected response : load the pdf
- Instead, it is giving
ModuleNotFoundError: No module named 'pillow_heif'
System Info
Langchain Version on Google Colab
langchain==0.1.16
langchain-community==0.0.34
langchain-core==0.1.45
langchain-text-splitters==0.0.1
Langchain Community Version on Google Colab
langchain-community==0.0.34
Trying to follow Meta Developer's llama-2 tutorial. Here's a link for reference - https://youtu.be/Z5MFSlDrOdA?t=1539
Hi @ishan-siddiqui , you will need to install the unstructured
package before the import:
pip install unstructured[all-docs]
Source: unstructured_file.ipynb