gpt-2-simple icon indicating copy to clipboard operation
gpt-2-simple copied to clipboard

ValueError: Variable model/wpe already exists, disallowed

Open yissachar opened this issue 5 years ago • 7 comments

Seeing the same error as outlined in https://github.com/minimaxir/gpt-2-simple/issues/12, however I am on 0.5.3.

Generate the first time:

sess = gpt2.start_tf_sess()
gpt2.finetune(sess,
              '../output/instructions.csv',
              model_name=model_name,
              steps=1000) 

It works fine. In a new cell, copy paste the above to fine tune further but get an error about model/wpe already existing. I tried explicitly setting restore_from='latest' even though that seems to be the default, and it didn't help.


ValueError Traceback (most recent call last) in 4 model_name=model_name, 5 restore_from='latest', ----> 6 steps=1000)

/opt/conda/lib/python3.6/site-packages/gpt_2_simple/gpt_2.py in finetune(sess, dataset, steps, model_name, combine, batch_size, learning_rate, accumulate_gradients, restore_from, run_name, sample_every, sample_length, sample_num, save_every, print_every, max_checkpoints, use_memory_saving_gradients, only_train_transformer_layers, overwrite) 163 164 context = tf.placeholder(tf.int32, [batch_size, None]) --> 165 output = model.model(hparams=hparams, X=context) 166 loss = tf.reduce_mean( 167 tf.nn.sparse_softmax_cross_entropy_with_logits(

/opt/conda/lib/python3.6/site-packages/gpt_2_simple/src/model.py in model(hparams, X, past, scope, reuse) 151 152 wpe = tf.get_variable('wpe', [hparams.n_ctx, hparams.n_embd], --> 153 initializer=tf.random_normal_initializer(stddev=0.01)) 154 wte = tf.get_variable('wte', [hparams.n_vocab, hparams.n_embd], 155 initializer=tf.random_normal_initializer(stddev=0.02))

/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in get_variable(name, shape, dtype, initializer, regularizer, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation) 1477 constraint=constraint, 1478 synchronization=synchronization, -> 1479 aggregation=aggregation) 1480 1481

/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in get_variable(self, var_store, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation) 1218 constraint=constraint, 1219 synchronization=synchronization, -> 1220 aggregation=aggregation) 1221 1222 def _get_partitioned_variable(self,

/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in get_variable(self, name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, custom_getter, constraint, synchronization, aggregation) 545 constraint=constraint, 546 synchronization=synchronization, --> 547 aggregation=aggregation) 548 549 def _get_partitioned_variable(self,

/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in _true_getter(name, shape, dtype, initializer, regularizer, reuse, trainable, collections, caching_device, partitioner, validate_shape, use_resource, constraint, synchronization, aggregation) 497 constraint=constraint, 498 synchronization=synchronization, --> 499 aggregation=aggregation) 500 501 # Set trainable value based on synchronization value.

/opt/conda/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py in _get_single_variable(self, name, shape, dtype, initializer, regularizer, partition_info, reuse, trainable, collections, caching_device, validate_shape, use_resource, constraint, synchronization, aggregation) 846 tb = [x for x in tb if "tensorflow/python" not in x[0]][:3] 847 raise ValueError("%s Originally defined at:\n\n%s" % (err_msg, "".join( --> 848 traceback.format_list(tb)))) 849 found_var = self._vars[name] 850 if not shape.is_compatible_with(found_var.get_shape()):

ValueError: Variable model/wpe already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

File "/opt/conda/lib/python3.6/site-packages/gpt_2_simple/src/model.py", line 153, in model initializer=tf.random_normal_initializer(stddev=0.01)) File "/opt/conda/lib/python3.6/site-packages/gpt_2_simple/gpt_2.py", line 165, in finetune output = model.model(hparams=hparams, X=context) File "", line 5, in steps=1000) # steps is max number of training steps

yissachar avatar Jul 01 '19 20:07 yissachar

In a new cell, copy paste the above to fine tune further

Try to restart the Python session. #77

From the README:

NB: Restart the Python session first if you want to finetune on another dataset or load another model.

From the notebook:

IMPORTANT NOTE: If you want to rerun this cell, restart the VM first (Runtime -> Restart Runtime). You will need to rerun imports but not recopy files.

woctezuma avatar Jul 01 '19 20:07 woctezuma

Thanks, that is what I did to workaround this, but it seems like it would be desirable to allow users to re-finetune without restating. Is there some fundamental limitation that prevents this?

Also, I had read the README but it wasn't clear to me that re-finetuning on the same model was covered by this - perhaps the wording can be tweaked to make this clearer?

yissachar avatar Jul 01 '19 20:07 yissachar

it wasn't clear to me that re-finetuning on the same model was covered by this - perhaps the wording can be tweaked to make this clearer?

I agree. The README makes it sound like one does not have to restart the VM if the dataset is identical.

It could be changed to:

NB: Restart the Python session first if you want to finetune further.

woctezuma avatar Jul 01 '19 20:07 woctezuma

Agree that a README change would be more clear (my use case for retraining on the same dataset is through the CLI which refreshes the session; hadn't considered the Colab notebook use case).

I'll push a change today.

Thanks, that is what I did to workaround this, but it seems like it would be desirable to allow users to re-finetune without restating. Is there some fundamental limitation that prevents this?

It's more-or-less due to how TensorFlow works and I'm not skilled enough with low-level TF to find a workaround.

However, I think I can add a reset function to avoid reloading the notebook, as the implementations used in the Cloud Run APIs reset correctly.

minimaxir avatar Jul 01 '19 22:07 minimaxir

Try adding tf.reset_default_graph() before each fine-tuning session. This works for me to continue fine-tuning:

import tensorflow as tf
# ...

tf.reset_default_graph()
sess = gpt2.start_tf_sess()
gpt2.finetune(sess,
              'dataset.txt',
              model_name='345M',
              steps=10)

tkocmathla avatar Jul 02 '19 13:07 tkocmathla

In my case I put it here

    tf.reset_default_graph()
    if not sess:
        sess = gpt2.start_tf_sess()
    else:
        sess = gpt2.reset_session(sess)
    
    gpt2.load_gpt2(sess, run_name=run_name)

and it perfectly worked! Thanks!

loretoparisi avatar Feb 18 '21 20:02 loretoparisi

For users encountering the error AttributeError: module 'tensorflow' has no attribute 'reset_default_graph', try adding the following at the top of the finetuning code:

import tensorflow as tf
tf.compat.v1.reset_default_graph()

(Source)

Mennaruuk avatar Mar 10 '22 05:03 Mennaruuk