sbi How many copies of the neural networks are created during inference?

I'm trying to figure ow many copies of the neural networks are created during inference as the memory requirements after 20 rounds go from 6 GB (rounds 1-10) to 24 during round 30. I have a embedding net which takes up space since the in-features are 100k and the last layer is dimension of 1x200. Though I am not sure if that is the problem.

Feb 24 '23 18:02 rmehta1987

Hey,

generally for each round a copy is saved as you can see here.

Unfortunatly as far as I know, there is no nice way to disabling caching all previous models. But after each round you can always manually set

inference._model_bank = inference._model_bank[-1:]

Note you need to keep at least the model of the last round in the model bank as, depending on the algorithm, it might be required to compute the loss.

Kind regards, Manuel

Feb 27 '23 12:02 manuelgloeckler

Hi Manuel,

A deepcopy of the density estimator is also done during training, https://github.com/mackelab/sbi/blob/main/sbi/inference/snpe/snpe_base.py#L424 . Is this also necessary?

I don't think a deepcopy of the posterior would be costly memory wise as the it only depends on the dimensionality of the prior (I think).

Feb 28 '23 02:02 rmehta1987

Hey,

This deepcopy just ensures that any modification on the returned density estmator is not propagated to the _neural_net attributed, which is managed by the NeuralInference class. You do not have to use it and can delete it right away, you can still build the posterior without passing the density estimator the class then just uses the _neural_net attribute. See here, which also seems to not use an deepcopy.

A deepcopy of the posterior is equally (or more) costly as all of the different posterior classes also have the _neural_net attribute now renamed as posterior_estimator.

Kind regards, Manuel

Feb 28 '23 08:02 manuelgloeckler

I commented out the line, https://github.com/mackelab/sbi/blob/main/sbi/inference/snpe/snpe_base.py as it seems to be only used for SNPE-B which has not yet been implemented.

The other memory usage comes from storing the simulated data after each round. As the simulated data is of large size with dim (1x111709) the number of simulations in each round and then subsequently the number of total rounds leads to a large number of tensors holding the dataset. For example if I create 50 simulations and perform a 100 rounds, the total memory usage of the dataset will be approximately 2.1 GB. Therefore if I wanted to increase the number of simulations to improve inference, most of the GPU memory would be taken up by the dataset.

For now I put the data onto the CPU and when reload to GPU during retraining after multiple rounds. Any other suggestions would be awesome!

Thank you for answering all the questions!

Mar 01 '23 17:03 rmehta1987

Hey, it is generally common/recommended to keep the whole dataset on the CPU for high dimensional data (or even on disk if it does not fit into RAM). In these cases one typically only uploads the current batch of data required to compute the loss at each iteration of optimization. The batch size can be freely chosen such that the data will always fit into GPU memory.

This is also the behavior implemented in SBI i.e. even if your data is saved on CPU, the current batch required to compute the loss, will be moved to GPU (see here). As long as you have enough RAM I would recommend to store data always on CPU, the memory cost on GPU should then be constant across rounds. If it is still to large you might have to reduce the training_batch_size.

As you can see here you can set a data_device, which can be different the from the compute device. As you can see here if you do not pass it defaults to the compute device thus saving all data on e.g. GPU. So I would recommend to use something like

infer = SNPE_C(..., device="cuda")
infer.append_simulations(theta, x, data_device="cpu")

Maybe this is what you are already doing :)

Kind regards, Manuel

Mar 03 '23 10:03 manuelgloeckler

Manuel's answer is a valid solution for the presented issue.

In the long run, we will want to enable passing a custom dataloader that does all the data and memory handling.

Jul 22 '24 06:07 janfb