Curator icon indicating copy to clipboard operation
Curator copied to clipboard

Zyda2 tutorial - TypeError when initializing Dask CPU cluster

Open ronjer30 opened this issue 1 year ago • 1 comments

Describe the bug

In the Zyda2 tutorial, several scripts like the process_dclm.py attempt to start a Dask LocalCluster. These scripts take an environment variable CPU_WORKERS = os.environ.get("CPU_WORKERS") to setup the cluster with equivalent workers using the following code cluster = LocalCluster(n_workers=CPU_WORKERS, processes=True, memory_limit="48GB"). A TypeError is raised because n_workers is expected to be an integer.

Steps/Code to reproduce bug

  1. Follow steps in tutorial
  2. Run python3 0_processing/process_dclm.py
  3. Script errors with following error
Traceback (most recent call last):
  File "...NeMo-Curator/tutorials/zyda2-tutorial/0_processing/process_dclm.py", line 21, in <module>
    cluster = LocalCluster(n_workers=CPU_WORKERS, processes=True, memory_limit="48GB")
  File "/usr/local/lib/python3.10/dist-packages/distributed/deploy/local.py", line 211, in __init__
    threads_per_worker = max(1, int(math.ceil(CPU_COUNT / n_workers)))
TypeError: unsupported operand type(s) for /: 'int' and 'str' 

Expected behavior

Dask cluster is created and data is processed, script completes successfully

Environment overview (please complete the following information)

  • Environment location: Slurm
  • Method of NeMo-Curator install: docker container, dev image from nvcr.io/nvidia/nemo:dev

ronjer30 avatar Nov 05 '24 20:11 ronjer30

@ronjer30 Could you share the latest on this?

sithape2025 avatar Jan 22 '25 19:01 sithape2025

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jul 26 '25 02:07 github-actions[bot]

This issue was closed because it has been inactive for 7 days since being marked as stale.

github-actions[bot] avatar Aug 03 '25 02:08 github-actions[bot]