castor
castor copied to clipboard
Fix insane memory usage when loading datasets
@achyudhk reports that CharCNN on some dataset uses 63GB of RAM (Hydra and Dragon both have 64GB). I think a solution would be some mechanism for moving data between disk and RAM when needed?
For now this is something specific to CharCNN due to the large size of the character quantized matrices. But in general I feel it's better to have a streaming approach to loading the dataset from disk and preprocessing it rather than caching all of it in memory.
@achyudhk , I think after 10th Dec, we can fix this issue, given that we have been using the repo for quite a while now. SG? @daemon , can you assign us to the issue? Coz even HAN was facing a similar issue but not as alarming as CharCNN perhaps.