NeMo-Curator icon indicating copy to clipboard operation
NeMo-Curator copied to clipboard

Zyda2 tutorial - TypeError when initializing Dask CPU cluster

Open ronjer30 opened this issue 1 year ago • 1 comments
trafficstars

Describe the bug

In the Zyda2 tutorial, several scripts like the process_dclm.py attempt to start a Dask LocalCluster. These scripts take an environment variable CPU_WORKERS = os.environ.get("CPU_WORKERS") to setup the cluster with equivalent workers using the following code cluster = LocalCluster(n_workers=CPU_WORKERS, processes=True, memory_limit="48GB"). A TypeError is raised because n_workers is expected to be an integer.

Steps/Code to reproduce bug

  1. Follow steps in tutorial
  2. Run python3 0_processing/process_dclm.py
  3. Script errors with following error
Traceback (most recent call last):
  File "...NeMo-Curator/tutorials/zyda2-tutorial/0_processing/process_dclm.py", line 21, in <module>
    cluster = LocalCluster(n_workers=CPU_WORKERS, processes=True, memory_limit="48GB")
  File "/usr/local/lib/python3.10/dist-packages/distributed/deploy/local.py", line 211, in __init__
    threads_per_worker = max(1, int(math.ceil(CPU_COUNT / n_workers)))
TypeError: unsupported operand type(s) for /: 'int' and 'str' 

Expected behavior

Dask cluster is created and data is processed, script completes successfully

Environment overview (please complete the following information)

  • Environment location: Slurm
  • Method of NeMo-Curator install: docker container, dev image from nvcr.io/nvidia/nemo:dev

ronjer30 avatar Nov 05 '24 20:11 ronjer30