Memory Issue
Hello @guxd, I tried downloading the real dataset from google drive and training the model for 2 epochs. It worked fine for it. When I am trying code embedding with the last epoch as an optimal checkpoint, the cell is getting terminated after running for some time. When I searched in google they said it might be because of a RAM issue and suggested upgrading the RAM. Is there any other way around that could work, like decreasing the batch_size or chunk_size or any other parameter? (currently 'batch_size': 100,'chunk_size':100000 )
Update: I tried decreasing the batch_sizes to 100, 64, but still I am facing the same issue.

How about reducing chunk_size? You can track the variable vecs and check whether it is allocated with memory after calling vecs = [].
You can also try to use a small codebase given you have limited memory.
How about reducing chunk_size? You can track the variable
vecsand check whether it is allocated with memory after callingvecs = []. You can also try to use a small codebase given you have limited memory.
@guxd When you say small codebase, does that mean using a dummy dataset instead of the real dataset? Also, during preprocessing step, how did you extract the <method name, API sequence, tokens, description> tuples from the java code snippets?
I mean using a subset of the use.XXX.h5 from Google drive. For example, using only 1 million code snippets.