data-juicer icon indicating copy to clipboard operation
data-juicer copied to clipboard

使用demos中的例子时,碰到cfg有点问题

Open calledice opened this issue 5 months ago • 4 comments

TypeError: 'NoneType' object is not callable Traceback: File "D:\python_code\pdf_process\data-juicer\demos\process_cft_zh_data\app.py", line 231, in main() File "D:\python_code\pdf_process\data-juicer\demos\process_cft_zh_data\app.py", line 227, in main Visualize.visualize() File "D:\python_code\pdf_process\data-juicer\demos\process_cft_zh_data\app.py", line 223, in visualize Visualize.analyze_process() File "D:\python_code\pdf_process\data-juicer\demos\process_cft_zh_data\app.py", line 164, in analyze_process process_and_show_res() File "D:\python_code\pdf_process\data-juicer\demos\process_cft_zh_data\app.py", line 100, in process_and_show_res analyzer = Analyzer(cfg) File "C:\Users\zj.conda\envs\dj\lib\site-packages\data_juicer\core\analyzer.py", line 49, in init self.dataset_builder = DatasetBuilder(self.cfg, File "C:\Users\zj.conda\envs\dj\lib\site-packages\data_juicer\core\data\dataset_builder.py", line 86, in init stra = DataLoadStrategyRegistry.get_strategy_class(

calledice avatar Jun 18 '25 06:06 calledice

When I run the same demo, there is no problem processing the data. can you share the environment setup and how you run the demo?

cyruszhang avatar Jun 18 '25 17:06 cyruszhang

When I run the same demo, there is no problem processing the data. can you share the environment setup and how you run the demo? 您好,我也遇到了相同的提问题,我的环境env如下: aiohappyeyeballs==2.6.1 aiohttp==3.12.13 aiosignal==1.4.0 alabaster==1.0.0 altair==5.5.0 annotated-types==0.7.0 anyio==4.9.0 appdirs==1.4.4 argcomplete==3.6.2 asgiref==3.9.0 attr==0.3.1 attrs==25.3.0 audioread==3.0.1 av==13.1.0 azure-core==1.35.0 azure-storage-blob==12.25.1 babel==2.17.0 beautifulsoup4==4.13.4 black==25.1.0 bleach==5.0.1 blinker==1.9.0 blis==0.7.11 boto==2.49.0 boto3==1.39.3 botocore==1.39.3 bs4==0.0.2 build==1.2.2.post1 cachetools==5.5.2 catalogue==2.0.10 certifi==2025.6.15 cffi==1.17.1 cfgv==3.4.0 charset-normalizer==3.4.2 click==8.2.1 cloudpathlib==0.16.0 colorama==0.4.6 commonmark==0.9.1 confection==0.1.5 contourpy==1.3.2 coverage==7.9.2 cryptography==45.0.5 cycler==0.12.1 cymem==2.0.11 datamodel-code-generator==0.26.1 datasets==3.6.0 decorator==5.2.1 defusedxml==0.7.1 dill==0.3.4 distlib==0.3.9 distro==1.9.0 Django==5.1.11 django-annoying==0.10.6 django-cors-headers==3.6.0 django-csp==3.7 django-debug-toolbar==3.2.1 django-environ==0.10.0 django-extensions==3.2.3 django-filter==24.3 django-migration-linter==5.2.0 django-model-utils==4.1.1 django-ranged-fileresponse==0.1.2 django-rq==2.10.3 django-storages==1.12.3 django-user-agents==0.4.0 djangorestframework==3.15.2 djangorestframework_simplejwt==5.4.0 dnspython==2.7.0 docker-pycreds==0.4.0 docstring_parser==0.16 docutils==0.21.2 drf-dynamic-fields==0.3.0 drf-flex-fields==0.9.5 drf-generators==0.3.0 email_validator==2.2.0 emoji==2.2.0 expiringdict==1.2.2 Faker==37.4.0 fastapi==0.110.3 filelock==3.18.0 fire==0.7.0 flake8==7.3.0 flake8-black==0.3.6 fonttools==4.58.5 frozenlist==1.7.0 fsspec==2023.5.0 furo==2024.8.6 genson==1.3.0 gitdb==4.0.12 GitPython==3.1.44 google-api-core==2.25.1 google-auth==2.40.3 google-cloud-appengine-logging==1.6.2 google-cloud-audit-log==0.3.2 google-cloud-core==2.4.3 google-cloud-logging==3.12.1 google-cloud-storage==2.19.0 google-crc32c==1.7.1 google-resumable-media==2.7.2 googleapis-common-protos==1.70.0 grpc-google-iam-v1==0.14.2 grpcio==1.73.1 grpcio-status==1.73.1 h11==0.16.0 httpcore==1.0.9 httpx==0.28.1 httpx-sse==0.4.1 huggingface-hub==0.33.2 humansignal-drf-yasg==1.21.10.post1 identify==2.6.12 idna==3.10 ijson==3.4.0 imagesize==1.4.1 importlib_metadata==8.7.0 importlib_resources==6.5.2 inflect==5.6.2 inflection==0.5.1 iniconfig==2.1.0 isodate==0.7.2 isort==5.13.2 Jinja2==3.1.6 jiter==0.10.0 jmespath==1.0.1 joblib==1.5.1 jsf==0.11.2 jsonargparse==4.40.0 jsonlines==4.0.0 jsonschema==4.24.0 jsonschema-specifications==2025.4.1 kiwisolver==1.4.8 label-studio==1.17.0 label-studio-sdk==1.0.11 langcodes==3.5.0 language_data==1.3.0 launchdarkly-server-sdk==8.2.1 lazy_loader==0.4 librosa==0.11.0 linkify-it-py==2.0.3 llvmlite==0.44.0 lockfile==0.12.2 loguru==0.7.3 lxml==6.0.0 lxml_html_clean==0.4.2 lz4==4.4.4 marisa-trie==1.2.1 markdown-it-py==3.0.0 MarkupSafe==3.0.2 matplotlib==3.10.3 mccabe==0.7.0 mcp==1.10.1 mdit-py-plugins==0.4.2 mdurl==0.1.2 mpmath==1.3.0 msgpack==1.1.1 multidict==6.6.3 multiprocess==0.70.12 murmurhash==1.0.13 mwparserfromhell==0.7.2 mypy_extensions==1.1.0 myst-parser==4.0.1 narwhals==1.45.0 networkx==3.5 nltk==3.9.1 nodeenv==1.9.1 numba==0.61.2 numpy==1.26.4 openai==1.93.0 opentelemetry-api==1.34.1 ordered-set==4.0.2 packaging==25.0 pandas==2.3.0 pathlib_abc==0.1.1 pathspec==0.12.1 pathy==0.11.0 pbr==6.1.1 pdfminer.six==20250506 pdfplumber==0.11.7 pillow==11.3.0 platformdirs==4.3.8 plotly==6.2.0 pluggy==1.6.0 pooch==1.8.2 pre_commit==4.2.0 preshed==3.0.10 propcache==0.3.2 proto-plus==1.26.1 protobuf==5.29.5 psutil==7.0.0 psycopg2-binary==2.9.10 -e git+https://github.com/modelscope/data-juicer.git@1fec6533a3852022070031536bf344220aab44be#egg=py_data_juicer pyarrow==20.0.0 pyasn1==0.6.1 pyasn1_modules==0.4.2 pyboxen==1.3.0 pycodestyle==2.14.0 pycparser==2.22 pydantic==2.11.7 pydantic-settings==2.10.1 pydantic_core==2.33.2 pydeck==0.9.1 pyflakes==3.4.0 Pygments==2.19.2 PyJWT==2.10.1 pylance==0.31.0 pyparsing==3.2.3 pypdfium2==4.30.1 pyproject_hooks==1.2.0 pyRFC3339==2.0.1 pytest==8.4.1 pytest-cov==6.2.1 python-dateutil==2.9.0.post0 python-docx==1.2.0 python-dotenv==1.1.1 python-json-logger==2.0.4 python-multipart==0.0.20 pytz==2022.7.1 PyYAML==6.0.2 ray==2.40.0 recommonmark==0.7.1 redis==5.2.1 referencing==0.36.2 regex==2024.11.6 requests==2.32.4 requests-file==2.1.0 requests-mock==1.12.1 resampy==0.4.3 rich==14.0.0 roman-numerals-py==3.1.0 rpds-py==0.26.0 rq==1.16.2 rsa==4.9.1 rstr==3.2.2 rules==3.4 s3transfer==0.13.0 samplerate==0.1.0 scikit-learn==1.7.0 scipy==1.16.0 seaborn==0.13.2 semver==3.0.4 sentry-sdk==2.32.0 setproctitle==1.3.6 setuptools==78.1.1 shellingham==1.5.4 six==1.17.0 smart-open==6.4.0 smmap==5.0.2 sniffio==1.3.1 snowballstemmer==3.0.1 soundfile==0.13.1 soupsieve==2.7 soxr==0.5.0.post1 spacy==3.8.0 spacy-legacy==3.0.12 spacy-loggers==1.0.5 Sphinx==8.2.3 sphinx-autobuild==2024.10.3 sphinx-basic-ng==1.0.0b2 sphinx-copybutton==0.5.2 sphinx-multiversion==0.2.4 sphinx-rtd-theme==3.0.2 sphinxcontrib-apidoc==0.6.0 sphinxcontrib-applehelp==2.0.0 sphinxcontrib-devhelp==2.0.0 sphinxcontrib-htmlhelp==2.1.0 sphinxcontrib-jquery==4.1 sphinxcontrib-jsmath==1.0.1 sphinxcontrib-qthelp==2.0.0 sphinxcontrib-serializinghtml==2.0.0 sqlparse==0.5.3 srsly==2.5.1 sse-starlette==2.4.1 starlette==0.37.2 streamlit==1.46.1 sympy==1.14.0 tabulate==0.9.0 tenacity==9.1.2 termcolor==3.1.0 thinc==8.2.5 threadpoolctl==3.6.0 tldextract==5.3.0 toml==0.10.2 tomli==2.2.1 tomli_w==1.2.0 torch==2.7.1 tornado==6.5.1 tqdm==4.67.1 typer==0.16.0 typeshed_client==2.7.0 typing-inspection==0.4.1 typing_extensions==4.14.1 tzdata==2025.2 ua-parser==1.0.1 ua-parser-builtins==0.18.0.post1 uc-micro-py==1.0.3 ujson==5.10.0 uritemplate==4.2.0 urllib3==1.26.20 user-agents==2.2.0 uv==0.7.19 uvicorn==0.35.0 virtualenv==20.31.2 wandb==0.19.0 wasabi==1.1.3 watchdog==6.0.0 watchfiles==1.1.0 weasel==0.4.1 webencodings==0.5.1 websockets==15.0.1 wget==3.2 wheel==0.40.0 win32_setctime==1.2.0 wordcloud==1.9.4 xmljson==0.2.1 xxhash==3.5.0 yarl==1.20.1 zipp==3.23.0 zstandard==0.23.0 运行demo的方式韦Readme中的源码运行python tools/process_data.py --config configs/demo/process.yaml与指令运行dj-process --config configs/demo/process.yaml,都存在这个问题

Sycamore-777 avatar Jul 07 '25 08:07 Sycamore-777

境e

可以分享下python版本,dj版本或者分支,系统的情况么?可以帮助定位问题。谢谢

cyruszhang avatar Jul 08 '25 00:07 cyruszhang

境e

可以分享下python版本,dj版本或者分支,系统的情况么?可以帮助定位问题。谢谢 Python 3.12.11 di版本:1.4.0 win11

Sycamore-777 avatar Jul 10 '25 02:07 Sycamore-777