NeMo-Curator
NeMo-Curator copied to clipboard
Zyda2 tutorial - TypeError when initializing Dask CPU cluster
trafficstars
Describe the bug
In the Zyda2 tutorial, several scripts like the process_dclm.py attempt to start a Dask LocalCluster. These scripts take an environment variable
CPU_WORKERS = os.environ.get("CPU_WORKERS") to setup the cluster with equivalent workers using the following code cluster = LocalCluster(n_workers=CPU_WORKERS, processes=True, memory_limit="48GB"). A TypeError is raised because n_workers is expected to be an integer.
Steps/Code to reproduce bug
- Follow steps in tutorial
- Run
python3 0_processing/process_dclm.py - Script errors with following error
Traceback (most recent call last):
File "...NeMo-Curator/tutorials/zyda2-tutorial/0_processing/process_dclm.py", line 21, in <module>
cluster = LocalCluster(n_workers=CPU_WORKERS, processes=True, memory_limit="48GB")
File "/usr/local/lib/python3.10/dist-packages/distributed/deploy/local.py", line 211, in __init__
threads_per_worker = max(1, int(math.ceil(CPU_COUNT / n_workers)))
TypeError: unsupported operand type(s) for /: 'int' and 'str'
Expected behavior
Dask cluster is created and data is processed, script completes successfully
Environment overview (please complete the following information)
- Environment location: Slurm
- Method of NeMo-Curator install: docker container, dev image from nvcr.io/nvidia/nemo:dev