castor icon indicating copy to clipboard operation
castor copied to clipboard

Fix insane memory usage when loading datasets

Open daemon opened this issue 6 years ago • 2 comments

@achyudhk reports that CharCNN on some dataset uses 63GB of RAM (Hydra and Dragon both have 64GB). I think a solution would be some mechanism for moving data between disk and RAM when needed?

daemon avatar Nov 26 '18 02:11 daemon

For now this is something specific to CharCNN due to the large size of the character quantized matrices. But in general I feel it's better to have a streaming approach to loading the dataset from disk and preprocessing it rather than caching all of it in memory.

achyudh avatar Nov 26 '18 03:11 achyudh

@achyudhk , I think after 10th Dec, we can fix this issue, given that we have been using the repo for quite a while now. SG? @daemon , can you assign us to the issue? Coz even HAN was facing a similar issue but not as alarming as CharCNN perhaps.

Ashutosh-Adhikari avatar Nov 26 '18 04:11 Ashutosh-Adhikari