picaso Radiative transfer memory challenge

All our calculations are memory limited because of the opacities. Specifically, for GPUs and CPUs the number of simultaneous calculations we can run is limited by 1) the number of molecules’ opacities sources you are considering, 2) the resolution of the opacities you are loading, and 3) the wavelength range. The problem is slightly different for CPUs and GPUs.

For CPUs, the opacities eventually get queried via sqlite database when requested, then summed together eventually. So the memory profile looks like a stair case as you query each molecule via get_opacities function. Then once everything is summed in compute_opacity via TAUGAS, it all gets compressed. See figure showing the memory behavior:

For GPUs it's slightly different as it is not a "get what you need" model. Instead, all the opacities must be loaded on the GPU initially so right now we are limited by the entire opacity database. So you can imagine the memory load staying high and constant throughout the calculation. Yinan Zhao has begun working on an opacity emulator that reduces the memory load of the opacity database. Will link to ongoing branch ASAP.

Apr 11 '25 16:04 natashabatalha

Can you provide instructions/tests to recreate this issue and to recreate the graph of mem usage? I'd like to work on optimizing the memory usage. I believe that the method for improving this depends on where the data is being read in and where/how it is processed.

Nov 15 '25 17:11 kbpatel-333

Hi @kbpatel-333 ! Thanks for showing interest. Attaching a script where you can create simple graphs like the one I showed. This one isn't identical to the one I uploaded because it largely depends on the opacity file. For this test I was using the simple default R=15k opacity file but our zenodo repository has opacity files that are much larger ~60 Gb if you want to test with a larger memory load. However, ultimately the behavior in this plot is the same.

memory_profile.py

To expand a bit on what I stated above;

get_opacities stores the requested opacities by the user in a dictionary. The dictionary is structured such that the first key represents the molecule name. This was done for ease so that later in this loop I could simple recall the opacities by name and the mixing ratios. Doing it this way means that we have to individually store all the opacities first, then sum add them in that loop. You could also imagine conducting the query in that same loop. This way you grab the opacities, sum them, grab another sum it. Then, you are always only storying two opacities. The summed opacity along with the introduction of a new one.

Please let me know if you have any questions. I am open to any and all suggestions for making this more efficient.

Nov 17 '25 20:11 natashabatalha

Thank you, this is very helpful! I agree, it should be easy to switch this to work more efficiently. The second method you describe is more or less the best solution as it uses a constant amount of memory rather than increasing for more inputs, so we would essentially eliminate the issue of high memory usage.

I should be able to fix this in a branch pretty soon, and I'll see if I find other improvements as I test

Nov 19 '25 17:11 kbpatel-333