Memory issue
How large the Memory is needed to run the script, 16GB,32GB or larger? THANK YOU!
@AIRobotZhang
16GB is enough for R8, R52, mr and ohsumed, but not enough for 20ng. I tried this on a 16GB memory Mac PC.
32GB may be enough for 20ng with 200-dimensional first layer embeddings, but I am not sure, I successfully run this on a server.
I tried 20ng with lower dimensional first layer embeddings (e.g., 50, 30) by changing this line in train.py:
flags.DEFINE_integer('hidden1', 200, 'Number of units in hidden layer 1.')
Then the script can successfully run on the 16GB memory Mac PC, the classification accuracy is a bit lower (about 0.856), but still comparable to the results in the paper.
Thanks!
I was able to build_graph for 20ng and R52. But I get the following error. I am new in TF, but do you know how to tackle this? Any help would be greatly appreciated thank you! :) Python3 , TF Version: 1.12.0
Tensor("graphconvolution_2/SparseTensorDenseMatMul/SparseTensorDenseMatMul:0", shape=(?, 52), dtype=float32) WARNING:tensorflow:From /home/ashutosh1adhikari/GG/text_gcn1.18/metrics.py:6: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version. Instructions for updating:
Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default.
See tf.nn.softmax_cross_entropy_with_logits_v2.
2019-01-19 14:50:22.696244: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2019-01-19 14:50:23.604709: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635
pciBusID: 0000:65:00.0
totalMemory: 10.73GiB freeMemory: 10.53GiB
2019-01-19 14:50:23.755504: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties:
name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635
pciBusID: 0000:17:00.0
totalMemory: 10.73GiB freeMemory: 26.62MiB
2019-01-19 14:50:23.755609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1
Traceback (most recent call last):
File "train.py", line 92, in
@Ashutosh-Adhikari
Hi, thanks for running the code.
This is likely due to that your GPU 1 doesn't have enough memory.
Please try to set:
os.environ["CUDA_VISIBLE_DEVICES"] = "" or os.environ["CUDA_VISIBLE_DEVICES"] = "0"
in train.py, the first is using CPU only and the second is using your device 0.
I can run R52, R8 and MR on my GPU with 11.10GB memory, but 20ng could not be fed into GPU memory (so I set os.environ["CUDA_VISIBLE_DEVICES"] = "" for 20ng). R52 costs about 8.9GB memory:
Tensor("graphconvolution_2/SparseTensorDenseMatMul/SparseTensorDenseMatMul:0", shape=(?, 52), dtype=float32) 2019-01-19 18:38:17.703685: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2019-01-19 18:38:27.133182: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: name: Tesla K40c major: 3 minor: 5 memoryClockRate(GHz): 0.745 pciBusID: 0000:02:00.0 totalMemory: 11.17GiB freeMemory: 11.10GiB 2019-01-19 18:38:27.133263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> ( device: 0, name: Tesla K40c, pci bus id: 0000:02:00.0, compute capability: 3.5) Epoch: 0001 train_loss= 3.95103 train_acc= 0.06209 val_loss= 3.91855 val_acc= 0.65544 time= 1.27284 Epoch: 0002 train_loss= 3.91886 train_acc= 0.65215 val_loss= 3.85059 val_acc= 0.66003 time= 1.00218 Epoch: 0003 train_loss= 3.85277 train_acc= 0.65453 val_loss= 3.74156 val_acc= 0.65697 time= 0.98462 Epoch: 0004 train_loss= 3.73903 train_acc= 0.65589 val_loss= 3.58910 val_acc= 0.66462 time= 1.00455 Epoch: 0005 train_loss= 3.58286 train_acc= 0.65317 val_loss= 3.39421 val_acc= 0.66309 time= 1.00082 Epoch: 0006 train_loss= 3.38767 train_acc= 0.64824 val_loss= 3.16615 val_acc= 0.65390 time= 1.00847 Epoch: 0007 train_loss= 3.14865 train_acc= 0.64671 val_loss= 2.92268 val_acc= 0.65237 time= 2.04753 Epoch: 0008 train_loss= 2.91096 train_acc= 0.64399 val_loss= 2.68639 val_acc= 0.65084 time= 1.39975 Epoch: 0009 train_loss= 2.67882 train_acc= 0.64280 val_loss= 2.48329 val_acc= 0.64625 time= 0.97953
Another solution is using lower dimensional first layer embeddings:
flags.DEFINE_integer('hidden1', 50, 'Number of units in hidden layer 1.')
But the classification performance may be a bit worse.
@yao8839836 , I am able to replicate the results for everything but 20ng (20ng leads to "Core dumped" error even on CPU). Thanks for your prompt reply! If I understand correctly, we don't have batching in code. And that is why, with even such small datasets (against the likes of RCV1, IMDB, etc), we are facing memory issues. Right?
@Ashutosh-Adhikari
Yes, you are right, the current code could not support mini-batch, the whole graph is loaded into the memory. That is why we are facing memory issues.
@yao8839836 Thank you for your reply !
Thank you so much for referring me to the paper.
On Fri, Feb 15, 2019 at 4:38 PM Dr. Liang Yao [email protected] wrote:
@Ashutosh-Adhikari https://github.com/Ashutosh-Adhikari
Hi, I have found an inductive manner to train Text GCN, which can make prediction on brand new data without retraining, I used a two layers approximation version of fastGCN [1]:
https://github.com/matenure/FastGCN/blob/master/pubmed_inductive_appr2layers.py
This inductive GCN version also supports mini-batch. The test accuracy for 20NG is about 0.80 with rank0 =100, rank1 =100, lower than 0.8634 produced by our transductive Text GCN.
[1] Chen, J.; Ma, T.; and Xiao, C. 2018. Fastgcn: Fast learning with graph convolutional networks via importance sampling. In ICLR
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yao8839836/text_gcn/issues/16#issuecomment-464210443, or mute the thread https://github.com/notifications/unsubscribe-auth/AOidrzK0Boc7ENT5EYyeYlSa5s90TTJqks5vNyi4gaJpZM4aJHbU .