BiMPM icon indicating copy to clipboard operation
BiMPM copied to clipboard

Big Data Problem

Open xljhtq opened this issue 6 years ago • 1 comments

When I load the file with many data, I have met with a problem. The free memory will be smaller and smaller because of the exitence of sorting algorithm in the preprocessing step. What should I do to optimize it ?

xljhtq avatar Mar 21 '18 11:03 xljhtq

I think one solution is to modify the "InstanceBatch" class in "SentenceMatchDataStream.py". Right now, my code will load all data into memory and pad all variables beforehand (https://github.com/zhiguowang/BiMPM/blob/master/src/SentenceMatchDataStream.py#L165). However, the padding part will cost a lot of memory.

One way to fix this is that don't pad variables while loading all data, but conduct the padding procedure right before you use it. This line (https://github.com/zhiguowang/BiMPM/blob/master/src/SentenceMatchTrainer.py#L92) may be a good position to insert your padding function.

zhiguowang avatar Apr 02 '18 17:04 zhiguowang