PaddleOCR icon indicating copy to clipboard operation
PaddleOCR copied to clipboard

ModuleNotFoundError: No module named 'langchain.docstore'

Open MichaelZhe opened this issue 2 months ago • 13 comments

🔎 Search before asking

  • [x] I have searched the PaddleOCR Docs and found no similar bug report.
  • [x] I have searched the PaddleOCR Issues and found no similar bug report.
  • [x] I have searched the PaddleOCR Discussions and found no similar bug report.

🐛 Bug (问题描述)

Ubuntu22.04环境通过docker方式安装paddlepaddle: docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.2.0 然后启动并进入容器: docker run --name paddle -it -v $PWD:/paddle ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.2.0 /bin/bash 在容器内安装ocr python -m pip install "paddleocr[all]" 安装完成后在容器内使用文档中的命令行方式进行推理的例子: paddleocr doc_parser -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png 出现如下错误: Traceback (most recent call last): File "/usr/local/bin/paddleocr", line 5, in from paddleocr.main import console_entry File "/usr/local/lib/python3.10/dist-packages/paddleocr/init.py", line 15, in from paddlex.inference.utils.benchmark import benchmark File "/usr/local/lib/python3.10/dist-packages/paddlex/init.py", line 49, in from .inference import create_pipeline, create_predictor File "/usr/local/lib/python3.10/dist-packages/paddlex/inference/init.py", line 17, in from .pipelines import create_pipeline, load_pipeline_config File "/usr/local/lib/python3.10/dist-packages/paddlex/inference/pipelines/init.py", line 23, in from .attribute_recognition import ( File "/usr/local/lib/python3.10/dist-packages/paddlex/inference/pipelines/attribute_recognition/init.py", line 15, in from .pipeline import PedestrianAttributeRecPipeline, VehicleAttributeRecPipeline File "/usr/local/lib/python3.10/dist-packages/paddlex/inference/pipelines/attribute_recognition/pipeline.py", line 27, in from ..components import CropByBoxes File "/usr/local/lib/python3.10/dist-packages/paddlex/inference/pipelines/components/init.py", line 29, in from .retriever.base import BaseRetriever File "/usr/local/lib/python3.10/dist-packages/paddlex/inference/pipelines/components/retriever/init.py", line 15, in from .openai_bot_retriever import OpenAIBotRetriever File "/usr/local/lib/python3.10/dist-packages/paddlex/inference/pipelines/components/retriever/openai_bot_retriever.py", line 16, in from .base import BaseRetriever File "/usr/local/lib/python3.10/dist-packages/paddlex/inference/pipelines/components/retriever/base.py", line 25, in from langchain.docstore.document import Document ModuleNotFoundError: No module named 'langchain.docstore'

🏃‍♂️ Environment (运行环境)

OS ubuntu22.04
CPU I5-7400 x86_64

🌰 Minimal Reproducible Example (最小可复现问题的Demo)

docker pull ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.2.0
docker run --name paddle -it -v $PWD:/paddle ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/paddle:3.2.0 /bin/bash
python -m pip install "paddleocr[all]"
paddleocr doc_parser -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/paddleocr_vl_demo.png

MichaelZhe avatar Oct 18 '25 09:10 MichaelZhe

可能安装的版本不对,建议按照文档重新执行一遍安装流程试试看。如果依然有问题,我们会及时的跟进解决。

Sunting78 avatar Oct 18 '25 10:10 Sunting78

same here. would recommend to pin dependency versions.

haoyun avatar Oct 18 '25 11:10 haoyun

Change Langchain version <1.0.0

yk690520 avatar Oct 18 '25 12:10 yk690520

我的也跑不起来

C:\Users\24723\Desktop\test>paddleocr text_detection -i 1.png Traceback (most recent call last): File "", line 198, in run_module_as_main File "", line 88, in run_code File "D:\python\Scripts\paddleocr.exe_main.py", line 2, in File "D:\python\Lib\site-packages\paddleocr_init.py", line 15, in from paddlex.inference.utils.benchmark import benchmark File "D:\python\Lib\site-packages\paddlex_init_.py", line 49, in from .inference import create_pipeline, create_predictor File "D:\python\Lib\site-packages\paddlex\inference_init_.py", line 17, in from .pipelines import create_pipeline, load_pipeline_config File "D:\python\Lib\site-packages\paddlex\inference\pipelines_init_.py", line 23, in from .attribute_recognition import ( File "D:\python\Lib\site-packages\paddlex\inference\pipelines\attribute_recognition_init_.py", line 15, in from .pipeline import PedestrianAttributeRecPipeline, VehicleAttributeRecPipeline File "D:\python\Lib\site-packages\paddlex\inference\pipelines\attribute_recognition\pipeline.py", line 27, in from ..components import CropByBoxes File "D:\python\Lib\site-packages\paddlex\inference\pipelines\components_init_.py", line 29, in from .retriever.base import BaseRetriever File "D:\python\Lib\site-packages\paddlex\inference\pipelines\components\retriever_init_.py", line 15, in from .openai_bot_retriever import OpenAIBotRetriever File "D:\python\Lib\site-packages\paddlex\inference\pipelines\components\retriever\openai_bot_retriever.py", line 16, in from .base import BaseRetriever File "D:\python\Lib\site-packages\paddlex\inference\pipelines\components\retriever\base.py", line 25, in from langchain.docstore.document import Document ModuleNotFoundError: No module named 'langchain.docstore'

as882301 avatar Oct 20 '25 08:10 as882301

Change Langchain version <1.0.0

@as882301

ExitPupil avatar Oct 20 '25 08:10 ExitPupil

更改 Langchain 版本 <1.0.0

@as882301

换了个版本果然可以 感谢大佬

as882301 avatar Oct 20 '25 08:10 as882301

更改 Langchain 版本 <1.0.0

@as882301

换了个版本果然可以 感谢大佬

请问你用的哪个版本? 安装 Langchain时,langchain-community需要额外安装吗?

greyPetCat avatar Oct 20 '25 09:10 greyPetCat

更改 Langchain 版本 <1.0.0

@as882301

换了个版本果然可以 感谢大佬

请问你用的哪个版本? 安装 Langchain时,langchain-community需要额外安装吗?

我就只切换了个版本就可以跑了其他的没动 切到的版本是 Version: 0.3.27

as882301 avatar Oct 20 '25 13:10 as882301

更改 Langchain 版本 <1.0.0

@as882301

换了个版本果然可以 感谢大佬

请问你用的哪个版本? 安装 Langchain时,langchain-community需要额外安装吗?

我就只切换了个版本就可以跑了其他的没动 切到的版本是 Version: 0.3.27

没毛病

UGhost-X avatar Oct 25 '25 09:10 UGhost-X

I also updated to langchain v1.0 and the issue was found. The solution here does not require to revert langchain back to older version.

Error:

File "../venv/lib/python3.13/site-packages/paddlex/inference/pipelines/components/retriever/base.py", line 25, in from langchain.docstore.document import Document ModuleNotFoundError: No module named 'langchain.docstore'

Solution

I updated just two lines in the file base.py as referenced above

 if is_dep_available("langchain"):
      from langchain.docstore.document import Document
      from langchain.text_splitter import RecursiveCharacterTextSplitter

WITH

 if is_dep_available("langchain"):
     from langchain_core.documents import Document
     from langchain_text_splitters import RecursiveCharacterTextSplitter

This is done because, the updated code for importing Document and Text Splitter in modern versions of LangChain (the ones compatible with LangChain 1.x.x) is:

from langchain_core.documents import Document
from langchain_text_splitters import RecursiveCharacterTextSplitter

vibhudost avatar Oct 25 '25 09:10 vibhudost

Using paddleocr in a repository which already migrated to langchain >1.0 is not possible right now.

Is there any update on this? Ideally, the dependencies could be decoupled a bit, because for just using some standard OCR model without paddleocr's built-in LLM features, langchain should not be required I guess?

Luux avatar Nov 14 '25 10:11 Luux