DeepZip icon indicating copy to clipboard operation
DeepZip copied to clipboard

Training on large datasets: memory overhead

Open margaritageleta opened this issue 3 years ago • 0 comments

Hello, I wanted to train the biLSTM model on human chromosome data (a training set of 15 GB). In terms of hardware, I have 240 GB of RAM memory. The parser runs ok:

Screenshot 1

However, when I execute the training, there is a memory error (it requires 849 GB, which is too much):

Screenshot 2

I wonder how you have trained on chromosome 1.

How many samples have you used and how many snps? What was the tensor size? Also, what is the expansion factor from raw data to processed data (input to the model)? From 15 GB to 849 GB it seems too much.

Thank you.

margaritageleta avatar Aug 04 '21 12:08 margaritageleta