learn2branch
learn2branch copied to clipboard
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed
Hello,
Thanks for your great work. I am currently trying to reproduce it and stuck with some problems about tensorflow. I have successfully generated training dataset for setcover and meet the following Error when I run the python 03_train_gcnn.py setcover -m baseline:
[2023-02-04 16:21:35.606917] EPOCH 0...
2023-02-04 16:21:51.414814: E tensorflow/stream_executor/cuda/cuda_blas.cc:652] failed to run cuBLAS routine cublasSgemm_v2: CUBLAS_STATUS_EXECUTION_FAILED
Traceback (most recent call last):
File "03_train_gcnn.py", line 253, in <module>
n = pretrain(model=model, dataloader=pretrain_data)
File "03_train_gcnn.py", line 49, in pretrain
if not model.pre_train(batched_states, tf.convert_to_tensor(True)):
File "/scratch/anji/rush_exps/learn2branch/models/baseline/model.py", line 260, in pre_train
self.call(*args, **kwargs)
File "/scratch/anji/rush_exps/learn2branch/models/baseline/model.py", line 418, in call
constraint_features = self.cons_embedding(constraint_features)
File "/home/anji/softwares/miniconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 757, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "/home/anji/softwares/miniconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/keras/engine/sequential.py", line 232, in call
inputs, training=training, mask=mask)
File "/home/anji/softwares/miniconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/keras/engine/sequential.py", line 250, in _call_and_compute_mask
x = layer.call(x, **kwargs)
File "/home/anji/softwares/miniconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/keras/layers/core.py", line 970, in call
outputs = gen_math_ops.mat_mul(inputs, self.kernel)
File "/home/anji/softwares/miniconda3/envs/py36/lib/python3.6/site-packages/tensorflow/python/ops/gen_math_ops.py", line 4586, in mat_mul
_six.raise_from(_core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InternalError: Blas GEMM launch failed : a.shape=(66090, 5), b.shape=(5, 64), m=66090, n=64, k=5 [Op:MatMul]
I am wondering if there is something wrong with my environment. But I followed the install.md and use python 3.6 and tensorflow-gpu 1.12.0. I will appreciate it if you could help solve the problem.
Hi, I met the same issue. Have you solved it? If you have any solution, please tell me. Thanks!
@liebenxj Hello! I heard that you successfully obtained a training sample by running 02_generate_dataset.py, could you please provide me with a setcover training sample data? I failed to run it successfully, but I want to observe the data in the samples, thank you!
@liebenxj hello! I heard that you successfully obtained a training sample by running 02_generate_dataset.py, could you please help me to solve these two problem?
- File "02_generate_dataset.py", line 37, in branchexeclp utilities.extract_khalil_variable_features(self.model, [], self.khalil_root_buffer) File "/home/fengruixiang/2024_04/learn2branch/utilities.py", line 305, in extract_khalil_variable_features scip_state = model.getKhalilState(root_buffer, candidates) AttributeError: 'pyscipopt.scip.Model' object has no attribute 'getKhalilState'
- File "02_generate_dataset.py", line 134, in make_samples m.setBoolParam('branching/vanillafullstrong/integralcands', True) File "src/pyscipopt/scip.pyx", line 2783, in pyscipopt.scip.Model.setBoolParam File "src/pyscipopt/scip.pyx", line 216, in pyscipopt.scip.PY_SCIP_CALL KeyError: 'SCIP: the parameter with the given name was not found!'