imSim icon indicating copy to clipboard operation
imSim copied to clipboard

Setting nproc

Open esheldon opened this issue 1 year ago • 9 comments

I have output.nproc: 1 but galsim is using 2 cores.

I can get it down to 1 core by setting OMP_NUM_THREADS to 1.

esheldon avatar Dec 18 '23 14:12 esheldon

Yes. This is a known problem. Cc @erykoff

beckermr avatar Dec 18 '23 14:12 beckermr

This could be an issue running imsim if you assume you should, for example, set nproc to the number of cores on the machine.

esheldon avatar Dec 18 '23 14:12 esheldon

Before running imsim or galsim you must set all the num threads vars. I thought this would be put into imsim (galsim wants to keep the flexibility of implicit multithreading for reasons that I don't understand).

erykoff avatar Dec 18 '23 14:12 erykoff

Note using 1 core vs 2 cores gave very similar run times as well, so I'm not sure what's using the extra cpu time.

esheldon avatar Dec 18 '23 14:12 esheldon

https://github.com/lsst/utils/blob/main/python/lsst/utils/threads.py#L38-L57

It may be that @cwwalter is waiting for my standalone shut-it-all-down package which I'll put together during the break.

erykoff avatar Dec 18 '23 14:12 erykoff

Implicit multithreading takes more resources and only occasionally improves runtime. Often it greatly increases the runtime by x10 or in some cases x100. I hates it.

erykoff avatar Dec 18 '23 14:12 erykoff

Yes, that can happen if you end up oversubscribing the cores due to each proc set by output.proc using more than one core per proc.

Setting OMP_NUM_THREADS to 1 does force it to use one core per proc as set in output.nproc

esheldon avatar Dec 18 '23 14:12 esheldon

Not just oversubscribing. Weird cache contention issues maybe. Unclear but it’s broken everywhere and should never be used.

erykoff avatar Dec 18 '23 15:12 erykoff

When running on places like USDF with many cores we find we need to use

export OMP_NUM_THREADS=1
export NUMEXPR_MAX_THREADS=1
export OMP_PROC_BIND=false

and are telling people running at scale to use that right now. I haven't bothered on things like my laptop for testing (but maybe I should).

When @erykoff has his Rubin function ready to turn this all off, we will call that instead (too?). I think @jchiang87 may have a branch with some of this functionality if you want to try it instead. This is some basic issue with one of the libraries we use in Rubin and it also seems machine dependent.

cwwalter avatar Dec 18 '23 16:12 cwwalter

I don't think there is more for us to do here on the imSim side. @jchiang87 do you have a comment?

cwwalter avatar Jun 10 '24 15:06 cwwalter

Right, I think this is handled by #441.

jchiang87 avatar Jun 10 '24 17:06 jchiang87