fitsmap
fitsmap copied to clipboard
Don't raise error if ray already initialized.
I was having problems running fitsmap via convert.dir_to_map
on a certain cluster using slurm. A little digging suggests that on clusters where slurm does not provide exclusive node access ray
still attempts to use all cores on the node, leading to errors.
The errors can be avoided by initializing ray with only a single cpu (I haven't checked if it works using the number of cpus requested via slurm) before calling fitsmap, but only if ray is then re-initialized within fitsmap with ignore_reinit_error=True
.
I'm not sure if this is the best way to address the issue, but thought I'd provide the fix in case it's helpful. Happy to close this and just raise an issue or rework this PR if you have suggestions.
Hi @bd-j, thanks for the PR I will take a look. I haven't done a lot of testing on slurm, so there could be an issue with how ray get's initialized. How do you make the call to fitsmap, is it via sbatch/srun or manually in an interactive session?
Sorry for the delayed reply. I seem to have somehow unsubscribed from notifications on this repo.
Hi @ryanhausen, it was sbatch, but not requesting an entire node (actually only requesting a single cpu). I added the following to the top of the script that called fitsmap.convert
import ray
ray.init(ignore_reinit_error=True, num_cpus=1)
I didn't test replacing num_cpus=1
with something like $SLURM_NTASKS for 1 < $SLURM_NTASKS < $SLURM_CPUS_ON_NODE
@bd-j thanks. I need to read some into how best to use ray and slurm. I want to make sure I don't implement things in a way that breaks other things. Thanks!