Curator icon indicating copy to clipboard operation
Curator copied to clipboard

Running Curator under SLURM Cluster

Open philm001 opened this issue 11 months ago • 6 comments

Hello all,

I have a quick question. I just want to make sure that my workflow is correct and my path to installation is correct.

I am wanting to the entire NeMo framework/eco-system under a SLURM cluster. However, for Curator, it is stated in the readme to run with the Framework Launcher.

Because I would like to get a head start with using NeMo 2.0, I am leaning to working with NeMo-run.

But before I go and start setting up everything, just wanted to check and verify that Curator is still able to run with a SLURM cluster and run with NeMo-run.

philm001 avatar Feb 07 '25 20:02 philm001

NeMo-Curator should be able to work both single and multi-node on SLURM clusters with both NeMo-Run which wraps some bash scripts that could be used to set up the cluster manually as well.

cc: @ryantwolf If you want to add anything.

ayushdg avatar Feb 10 '25 17:02 ayushdg

Thanks @ayushdg just needed confirmation on that

philm001 avatar Feb 10 '25 21:02 philm001

Yeah independently I'm also probably going to make a better integration that does more than just wrap CLI scripts. I'll let you know when I open a PR for it.

ryantwolf avatar Feb 10 '25 22:02 ryantwolf

@ryantwolf Just a quick question, is there a tutorial in nemo-curator on running curator under a slurm cluster with nemo-run?

philm001 avatar Feb 12 '25 18:02 philm001

We have this example: https://github.com/NVIDIA/NeMo-Curator/tree/main/examples/nemo_run

ryantwolf avatar Feb 12 '25 19:02 ryantwolf

Thanks for that! It is much simpler then what I was expecting.

One thing though, I won't be using docker containers right now. I will in the future once I have a better handling with Nemo and Nemo curator.

The script there has references to the docker container but since for Nemo-curator, I am going to do a local install on everything (via the pip method) I should remove all of the lines that reference the location of the docker container

philm001 avatar Feb 14 '25 20:02 philm001

This issue is stale because it has been open for 30 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Jul 25 '25 02:07 github-actions[bot]

Hi @philm001 closing this now. Please LMK if there are any other concerns here and I will be happy to re-open and discuss. Thanks!

sarahyurick avatar Jul 25 '25 17:07 sarahyurick