LADIES icon indicating copy to clipboard operation
LADIES copied to clipboard

run out GPU memory

Open jxzeng-git opened this issue 4 years ago • 4 comments

When I validate this model on Reddit dataset, the model always run out of gpu memory, where the validation are conducted on a machine with Tesla V100-PCIE GPU (32GB memory). This is inconsistent with the results shown in table 3 in your paper. The detailed error is as follow: Traceback (most recent call last): File "pytorch_ladies_.py", line 321, in output = best_model.forward(feat_data[input_nodes], adjs)[output_nodes] File "pytorch_ladies_.py", line 91, in forward x = self.encoder(feat, adjs) File "/root/anaconda3/envs/LADIES/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "pytorch_ladies_.py", line 81, in forward x = self.dropout(self.gcs[idx](x, adjs[idx])) File "/root/anaconda3/envs/LADIES/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, **kwargs) File "pytorch_ladies_.py", line 62, in forward return F.elu(torch.spmm(adj, out)) RuntimeError: CUDA out of memory. Tried to allocate 1.71 GiB (GPU 0; 31.75 GiB total capacity; 28.63 GiB already allocated; 453.50 MiB free; 1.63 GiB cached)

jxzeng-git avatar Jan 04 '21 13:01 jxzeng-git

I tried to set a smaller batch size when testing, while it was not useful. As well, everything is fine in training procedure. BTW, I tried to change the "default_sampler()" in testingprocedure to "ladies_sampler()", which dumped also.

jxzeng-git avatar Jan 04 '21 13:01 jxzeng-git

To make the evaluation consistent, we use the full-batch inference to get results for each node. So you probably need to use the CPU version to get test result.

However, it's also feasible to use batch-wise sampling to get inference results with GPU, which is very similar to the training procedure, by first determine the output node and then sample the computation graph.

acbull avatar Jan 05 '21 07:01 acbull

To make the evaluation consistent, we use the full-batch inference to get results for each node. So you probably need to use the CPU version to get test result.

However, it's also feasible to use batch-wise sampling to get inference results with GPU, which is very similar to the training procedure, by first determine the output node and then sample the computation graph.

Thank you for your reply!
I try to use CPU version. However, it is very slow and can not finish even over 26 hours, which confuses me further.
In addition, is the following code clip means I am using the batch-wise sampling during reference? for b in np.arange(len(test_nodes) // args.batch_size): batch_nodes = test_nodes[b * args.batch_size : (b+1) * args.batch_size] adjs, input_nodes, output_nodes = default_sampler(np.random.randint(2**32 - 1), batch_nodes, samp_num_list * 10, len(feat_data), lap_matrix, args.n_layers)
This code clip will result in "cuda out of memory".

I change the "n_layers" from 5 to 2, the reference do not dump while it needs 16.627GB GPU memory.

jxzeng-git avatar Jan 05 '21 08:01 jxzeng-git

The time on the paper is shown for training.

Oh sorry, that code is not that correct. If you want to use our sampling, please change the default_sampler to ladies_sampler. Otherwise, we don't need batch-wise sampling.

I've modified that part to make it more clear now. Sorry for this mistake.

acbull avatar Jan 05 '21 19:01 acbull