PaddleHelix icon indicating copy to clipboard operation
PaddleHelix copied to clipboard

Run error in GEM

Open kaisermoon opened this issue 2 years ago • 5 comments

An error raised when I am runing GEM I am using single GPU which is GeForce RTX 2080 Ti with 11G memory my code is the same as that in github: `### build model init_model = '/home/outdo/PaddleHelix/apps/pretrained_compound/ChemRL/GEM/pretrain_models-chemrl_gem/regr.pdparams'

compound_encoder = GeoGNNModel(compound_encoder_config) model = DownstreamModel(model_config, compound_encoder) if metric == 'square': criterion = nn.MSELoss() else: criterion = nn.L1Loss() encoder_params = compound_encoder.parameters() head_params = exempt_parameters(model.parameters(), encoder_params) encoder_opt = paddle.optimizer.Adam(args.encoder_lr, parameters=encoder_params) head_opt = paddle.optimizer.Adam(args.head_lr, parameters=head_params) print('Total param num: %s' % (len(model.parameters()))) print('Encoder param num: %s' % (len(encoder_params))) print('Head param num: %s' % (len(head_params))) for i, param in enumerate(model.named_parameters()): print(i, param[0], param[1].name)

if not init_model is None and not args.init_model == "": compound_encoder.set_state_dict(paddle.load(args.init_model)) print('Load state_dict from %s' % args.init_model)`

error information: `--------------------------------------------------------------------------- OSError Traceback (most recent call last) Input In [25], in <cell line: 5>() 4 import paddle.fluid as fluid 5 with fluid.device_guard("cpu"): ----> 6 compound_encoder = GeoGNNModel(compound_encoder_config) 7 model = DownstreamModel(model_config, compound_encoder) 8 if metric == 'square':

File ~/PaddleHelix/pahelix/model_zoo/gem_model.py:81, in GeoGNNModel.init(self, model_config) 78 self.bond_float_names = model_config['bond_float_names'] 79 self.bond_angle_float_names = model_config['bond_angle_float_names'] ---> 81 self.init_atom_embedding = AtomEmbedding(self.atom_names, self.embed_dim) 82 self.init_bond_embedding = BondEmbedding(self.bond_names, self.embed_dim) 83 self.init_bond_float_rbf = BondFloatRBF(self.bond_float_names, self.embed_dim)

File ~/PaddleHelix/pahelix/networks/compound_encoder.py:38, in AtomEmbedding.init(self, atom_names, embed_dim) 36 self.embed_list = nn.LayerList() 37 for name in self.atom_names: ---> 38 embed = nn.Embedding( 39 CompoundKit.get_atom_feature_size(name) + 5, 40 embed_dim, 41 weight_attr=nn.initializer.XavierUniform()) 42 self.embed_list.append(embed)

File ~/anaconda3/envs/paddlehelix/lib/python3.8/site-packages/paddle/nn/layer/common.py:1453, in Embedding.init(self, num_embeddings, embedding_dim, padding_idx, sparse, weight_attr, name) 1451 self._remote_prefetch = False 1452 self._name = name -> 1453 self.weight = self.create_parameter( 1454 attr=self._weight_attr, 1455 shape=self._size, 1456 dtype=self._dtype, 1457 is_bias=False) 1459 if in_dynamic_mode() and padding_idx != -1: 1460 with paddle.no_grad():

File ~/anaconda3/envs/paddlehelix/lib/python3.8/site-packages/paddle/fluid/dygraph/layers.py:423, in Layer.create_parameter(self, shape, attr, dtype, is_bias, default_initializer) 421 if isinstance(temp_attr, six.string_types) and temp_attr == "": 422 temp_attr = None --> 423 return self._helper.create_parameter(temp_attr, shape, dtype, is_bias, 424 default_initializer)

File ~/anaconda3/envs/paddlehelix/lib/python3.8/site-packages/paddle/fluid/layer_helper_base.py:376, in LayerHelperBase.create_parameter(self, attr, shape, dtype, is_bias, default_initializer, stop_gradient, type) 370 if is_used: 371 raise ValueError( 372 "parameter name [{}] have be been used. " 373 "In dygraph mode, the name of parameter can't be same." 374 "Please check the parameter attr value passed to self.create_parameter or " 375 "constructor of dygraph Layers".format(attr.name)) --> 376 return self.main_program.global_block().create_parameter( 377 dtype=dtype, 378 shape=shape, 379 type=type, 380 stop_gradient=stop_gradient, 381 **attr._to_kwargs(with_initializer=True)) 382 else: 383 self.startup_program.global_block().create_parameter( 384 dtype=dtype, 385 shape=shape, 386 type=type, 387 **attr._to_kwargs(with_initializer=True))

File ~/anaconda3/envs/paddlehelix/lib/python3.8/site-packages/paddle/fluid/framework.py:3572, in Block.create_parameter(self, *args, **kwargs) 3570 pass 3571 else: -> 3572 initializer(param, self) 3573 return param

File ~/anaconda3/envs/paddlehelix/lib/python3.8/site-packages/paddle/fluid/initializer.py:605, in XavierInitializer.call(self, var, block) 603 if self._uniform: 604 limit = np.sqrt(6.0 / float(fan_in + fan_out)) --> 605 out_var = _C_ops.uniform_random('shape', out_var.shape, 'min', 606 -limit, 'max', limit, 'seed', 607 self._seed, 'dtype', out_dtype) 608 else: 609 std = math.sqrt(2.0 / float(fan_in + fan_out))

OSError: [operator < uniform_random > error]`

Could anyone help me? Thank you so much!

kaisermoon avatar May 26 '22 04:05 kaisermoon

Which version of PaddlePaddle are you using?

LihangLiu avatar Jun 07 '22 10:06 LihangLiu

Which version of PaddlePaddle are you using?

I run this script on PaddlePaddle 2.3.0

kaisermoon avatar Jun 08 '22 07:06 kaisermoon

Hi, this issue seems to be related to the cudnn version. Could you please check your cudnn version by running

paddle.device.get_cudnn_version()

You can also refer to this issue: https://github.com/PaddlePaddle/Paddle/issues/34938

LihangLiu avatar Jun 08 '22 07:06 LihangLiu

Hi, this issue seems to be related to the cudnn version. Could you please check your cudnn version by running

paddle.device.get_cudnn_version()

You can also refer to this issue: PaddlePaddle/Paddle#34938

8100 was returned when I runed 'paddle.device.get_cudnn_version()'

kaisermoon avatar Jun 08 '22 07:06 kaisermoon

Hi, kaisermoon, are you running the code on cpu or gpu? If it's on gpu, some one has met the same problem as yours, and he solves it by reducing the bach size. If that does not work, you may also try to install cudnn 7.6 to see if that helps. Hope this can be helpful to you.

Noisyntrain avatar Jun 08 '22 10:06 Noisyntrain