text_gcn icon indicating copy to clipboard operation
text_gcn copied to clipboard

Memory issue

Open AIRobotZhang opened this issue 7 years ago • 7 comments

How large the Memory is needed to run the script, 16GB,32GB or larger? THANK YOU!

AIRobotZhang avatar Jan 19 '19 12:01 AIRobotZhang

@AIRobotZhang

16GB is enough for R8, R52, mr and ohsumed, but not enough for 20ng. I tried this on a 16GB memory Mac PC.

32GB may be enough for 20ng with 200-dimensional first layer embeddings, but I am not sure, I successfully run this on a server.

I tried 20ng with lower dimensional first layer embeddings (e.g., 50, 30) by changing this line in train.py:

flags.DEFINE_integer('hidden1', 200, 'Number of units in hidden layer 1.')

Then the script can successfully run on the 16GB memory Mac PC, the classification accuracy is a bit lower (about 0.856), but still comparable to the results in the paper.

Thanks!

yao8839836 avatar Jan 19 '19 20:01 yao8839836

I was able to build_graph for 20ng and R52. But I get the following error. I am new in TF, but do you know how to tackle this? Any help would be greatly appreciated thank you! :) Python3 , TF Version: 1.12.0

Tensor("graphconvolution_2/SparseTensorDenseMatMul/SparseTensorDenseMatMul:0", shape=(?, 52), dtype=float32) WARNING:tensorflow:From /home/ashutosh1adhikari/GG/text_gcn1.18/metrics.py:6: softmax_cross_entropy_with_logits (from tensorflow.python.ops.nn_ops) is deprecated and will be removed in a future version. Instructions for updating:

Future major versions of TensorFlow will allow gradients to flow into the labels input on backprop by default.

See tf.nn.softmax_cross_entropy_with_logits_v2.

2019-01-19 14:50:22.696244: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA 2019-01-19 14:50:23.604709: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:65:00.0 totalMemory: 10.73GiB freeMemory: 10.53GiB 2019-01-19 14:50:23.755504: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 1 with properties: name: GeForce RTX 2080 Ti major: 7 minor: 5 memoryClockRate(GHz): 1.635 pciBusID: 0000:17:00.0 totalMemory: 10.73GiB freeMemory: 26.62MiB 2019-01-19 14:50:23.755609: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0, 1 Traceback (most recent call last): File "train.py", line 92, in sess = tf.Session(config=session_conf) File "/home/ashutosh1adhikari/anaconda36/envs/ashenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1551, in init super(Session, self).init(target, graph, config=config) File "/home/ashutosh1adhikari/anaconda36/envs/ashenv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 676, in init self._session = tf_session.TF_NewSessionRef(self._graph._c_graph, opts) tensorflow.python.framework.errors_impl.InternalError: CUDA runtime implicit initialization on GPU:1 failed. Status: out of memory

Ashutosh-Adhikari avatar Jan 19 '19 22:01 Ashutosh-Adhikari

@Ashutosh-Adhikari

Hi, thanks for running the code.

This is likely due to that your GPU 1 doesn't have enough memory.

Please try to set:

os.environ["CUDA_VISIBLE_DEVICES"] = "" or os.environ["CUDA_VISIBLE_DEVICES"] = "0"

in train.py, the first is using CPU only and the second is using your device 0.

I can run R52, R8 and MR on my GPU with 11.10GB memory, but 20ng could not be fed into GPU memory (so I set os.environ["CUDA_VISIBLE_DEVICES"] = "" for 20ng). R52 costs about 8.9GB memory:

Tensor("graphconvolution_2/SparseTensorDenseMatMul/SparseTensorDenseMatMul:0", shape=(?, 52), dtype=float32) 2019-01-19 18:38:17.703685: I tensorflow/core/platform/cpu_feature_guard.cc:137] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 FMA 2019-01-19 18:38:27.133182: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Found device 0 with properties: name: Tesla K40c major: 3 minor: 5 memoryClockRate(GHz): 0.745 pciBusID: 0000:02:00.0 totalMemory: 11.17GiB freeMemory: 11.10GiB 2019-01-19 18:38:27.133263: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> ( device: 0, name: Tesla K40c, pci bus id: 0000:02:00.0, compute capability: 3.5) Epoch: 0001 train_loss= 3.95103 train_acc= 0.06209 val_loss= 3.91855 val_acc= 0.65544 time= 1.27284 Epoch: 0002 train_loss= 3.91886 train_acc= 0.65215 val_loss= 3.85059 val_acc= 0.66003 time= 1.00218 Epoch: 0003 train_loss= 3.85277 train_acc= 0.65453 val_loss= 3.74156 val_acc= 0.65697 time= 0.98462 Epoch: 0004 train_loss= 3.73903 train_acc= 0.65589 val_loss= 3.58910 val_acc= 0.66462 time= 1.00455 Epoch: 0005 train_loss= 3.58286 train_acc= 0.65317 val_loss= 3.39421 val_acc= 0.66309 time= 1.00082 Epoch: 0006 train_loss= 3.38767 train_acc= 0.64824 val_loss= 3.16615 val_acc= 0.65390 time= 1.00847 Epoch: 0007 train_loss= 3.14865 train_acc= 0.64671 val_loss= 2.92268 val_acc= 0.65237 time= 2.04753 Epoch: 0008 train_loss= 2.91096 train_acc= 0.64399 val_loss= 2.68639 val_acc= 0.65084 time= 1.39975 Epoch: 0009 train_loss= 2.67882 train_acc= 0.64280 val_loss= 2.48329 val_acc= 0.64625 time= 0.97953


Another solution is using lower dimensional first layer embeddings:

flags.DEFINE_integer('hidden1', 50, 'Number of units in hidden layer 1.')

But the classification performance may be a bit worse.

yao8839836 avatar Jan 20 '19 00:01 yao8839836

@yao8839836 , I am able to replicate the results for everything but 20ng (20ng leads to "Core dumped" error even on CPU). Thanks for your prompt reply! If I understand correctly, we don't have batching in code. And that is why, with even such small datasets (against the likes of RCV1, IMDB, etc), we are facing memory issues. Right?

Ashutosh-Adhikari avatar Jan 20 '19 17:01 Ashutosh-Adhikari

@Ashutosh-Adhikari

Yes, you are right, the current code could not support mini-batch, the whole graph is loaded into the memory. That is why we are facing memory issues.

yao8839836 avatar Jan 21 '19 01:01 yao8839836

@yao8839836 Thank you for your reply !

AIRobotZhang avatar Jan 21 '19 02:01 AIRobotZhang

Thank you so much for referring me to the paper.

On Fri, Feb 15, 2019 at 4:38 PM Dr. Liang Yao [email protected] wrote:

@Ashutosh-Adhikari https://github.com/Ashutosh-Adhikari

Hi, I have found an inductive manner to train Text GCN, which can make prediction on brand new data without retraining, I used a two layers​ approximation version of fastGCN [1]:

https://github.com/matenure/FastGCN/blob/master/pubmed_inductive_appr2layers.py

This inductive GCN version also supports mini-batch. The test accuracy for 20NG is about 0.80 with rank0 =100, rank1 =100, lower than 0.8634 produced by our transductive Text GCN.

[1] Chen, J.; Ma, T.; and Xiao, C. 2018. Fastgcn: Fast learning with graph convolutional networks via importance sampling. In ICLR

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/yao8839836/text_gcn/issues/16#issuecomment-464210443, or mute the thread https://github.com/notifications/unsubscribe-auth/AOidrzK0Boc7ENT5EYyeYlSa5s90TTJqks5vNyi4gaJpZM4aJHbU .

Ashutosh-Adhikari avatar Feb 20 '19 00:02 Ashutosh-Adhikari