LADIES
LADIES copied to clipboard
run out GPU memory
When I validate this model on Reddit dataset, the model always run out of gpu memory, where the validation are conducted on a machine with Tesla V100-PCIE GPU (32GB memory). This is inconsistent with the results shown in table 3 in your paper. The detailed error is as follow: Traceback (most recent call last):
File "pytorch_ladies_.py", line 321, in
I tried to set a smaller batch size when testing, while it was not useful. As well, everything is fine in training procedure. BTW, I tried to change the "default_sampler()" in testingprocedure to "ladies_sampler()", which dumped also.
To make the evaluation consistent, we use the full-batch inference to get results for each node. So you probably need to use the CPU version to get test result.
However, it's also feasible to use batch-wise sampling to get inference results with GPU, which is very similar to the training procedure, by first determine the output node and then sample the computation graph.
To make the evaluation consistent, we use the full-batch inference to get results for each node. So you probably need to use the CPU version to get test result.
However, it's also feasible to use batch-wise sampling to get inference results with GPU, which is very similar to the training procedure, by first determine the output node and then sample the computation graph.
Thank you for your reply!
I try to use CPU version. However, it is very slow and can not finish even over 26 hours, which confuses me further.
In addition, is the following code clip means I am using the batch-wise sampling during reference?
for b in np.arange(len(test_nodes) // args.batch_size): batch_nodes = test_nodes[b * args.batch_size : (b+1) * args.batch_size] adjs, input_nodes, output_nodes = default_sampler(np.random.randint(2**32 - 1), batch_nodes, samp_num_list * 10, len(feat_data), lap_matrix, args.n_layers)
This code clip will result in "cuda out of memory".
I change the "n_layers" from 5 to 2, the reference do not dump while it needs 16.627GB GPU memory.
The time on the paper is shown for training.
Oh sorry, that code is not that correct. If you want to use our sampling, please change the default_sampler to ladies_sampler. Otherwise, we don't need batch-wise sampling.
I've modified that part to make it more clear now. Sorry for this mistake.