nltools
nltools copied to clipboard
nltools ISC function
HI all,
I was trying both the nltools and Brainiak brainiak.isc.bootstrap_isc implementations of ISC, and was getting very different p-values between the two. I was running it on Discovery with 1 node and 16ppn, and set the parameter n_jobs = -1 (the default, which sets it to use all available processors). I played around with the number of processors just to compare speed, and realized that specifying less processors (around 1-4) would give me more similar results to brainiak.isc.bootstrap_isc, while increasing the number of processors gave me smaller and smaller p-values (and strangely also slower computing speed). This happens both when I change the n_jobs parameter or when I request different numbers of processors in the job script I submit to Discovery.
I've double checked the versions of nltools and joblib that I was using, and have confirmed that they're up to date. I’m wondering if this is a bug and that the implementation wasn’t optimized to be parallelized with that many processors. I’m hoping someone can shed some light/look into this! Thanks in advance!
@josiequita Can you provide a little bit more information about how you're calling ISC from within nltools? Also are you able to reproduce this on a non-cluster computer? Here's a link to a notebook running on my local machine where I'm not able to reproduce this error with randomly generated data: notebook link
Also in general, I tend to avoid using n_jobs=-1 on the cluster because of how resource sharing works. To avoid premature killing of jobs and to be a good citizen for others, it's preferable to be explicit (e.g. n_jobs=16). For example if requested 16ppn, and your job lands on a node with 64 cores, then the scheduler on Discovery will mark the other (64-16) 48 available for others to use. However, n_jobs=-1 will try to use everything on that node potentially causing issues for yourself or other users. Not an issue if your job lands on a machine with exactly the number of ppn you request and no one else is using that node.
I believe we have addressed this issue with #411 . closing for now.