gpt-2-simple icon indicating copy to clipboard operation
gpt-2-simple copied to clipboard

Contine training with checkpoint loaded from Google Drive failed

Open chutaklee opened this issue 5 years ago • 8 comments

gpt2.mount_gdrive()
gpt2.copy_checkpoint_from_gdrive("gpt2_medium_run1")
sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess, run_name='run1')
gpt2.finetune(
    sess,
    dataset=file_path,
    steps=500,
    print_every=10,
    sample_every=200,
    save_every=500,
    overwrite=True
)

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

<ipython-input-7-b53c2e790190> in <module>()
      6     sample_every=200,
      7     save_every=500,
----> 8     overwrite=True
      9 )

6 frames

/usr/local/lib/python3.6/dist-packages/tensorflow/python/ops/variable_scope.py in _get_single_variable(self, name, shape, dtype, initializer, regularizer, partition_info, reuse, trainable, collections, caching_device, validate_shape, use_resource, constraint, synchronization, aggregation)
    862         tb = [x for x in tb if "tensorflow/python" not in x[0]][:5]
    863         raise ValueError("%s Originally defined at:\n\n%s" %
--> 864                          (err_msg, "".join(traceback.format_list(tb))))
    865       found_var = self._vars[name]
    866       if not shape.is_compatible_with(found_var.get_shape()):

ValueError: Variable model/wpe already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at:

  File "/usr/local/lib/python3.6/dist-packages/gpt_2_simple/src/model.py", line 183, in model
    initializer=tf.compat.v1.random_normal_initializer(stddev=0.01))
  File "/usr/local/lib/python3.6/dist-packages/gpt_2_simple/gpt_2.py", line 345, in load_gpt2
    output = model.model(hparams=hparams, X=context)
  File "<ipython-input-3-f81553695c16>", line 4, in <module>
    gpt2.load_gpt2(sess, run_name='run1')
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2882, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "/usr/local/lib/python3.6/dist-packages/IPython/core/interactiveshell.py", line 2828, in run_ast_nodes
    if self.run_code(code, result):

I have tried method from 80 but it's not helping.

chutaklee avatar Aug 17 '19 15:08 chutaklee

Isn't there a discrepancy between your two "run names"?

Moreover, what happens if you leave overwrite to its default value (False)?

woctezuma avatar Aug 21 '19 15:08 woctezuma

Isn't there a discrepancy between your two "run names"?

Moreover, what happens if you leave overwrite to its default value (False)?

Thanks for your reply. But the same error happened regardless of the value of overwrite, and gpt2.generate() works fine with that checkpoint. I trained few models so can't use the default checkpoint filename.

chutaklee avatar Aug 21 '19 15:08 chutaklee

any updates on this issue? I am having the same one.

lucasmgomez avatar May 02 '20 19:05 lucasmgomez

any updates on this issue? I am having the same one.

Do sess= gpt2.reset_session(sess=sess) before running finetune.

zacc avatar Aug 28 '20 04:08 zacc

That line did not work for me in collab. Sharing the code: gpt2.mount_gdrive() file_name = "trainset.txt" gpt2.copy_file_from_gdrive(file_name) gpt2.copy_checkpoint_from_gdrive(run_name='model-3.0') sess= gpt2.start_tf_sess(threads=4) gpt2.load_gpt2(sess, run_name='model-3.0') sess=gpt2.reset_session(sess=sess) gpt2.finetune( sess, dataset=file_name, steps=1000, print_every=10, multi_gpu=True, learning_rate=0.002, sample_every=200, save_every=500, overwrite=True )

You need to download the GPT-2 model first via download_gpt2()


FileNotFoundError Traceback (most recent call last)

<ipython-input-9-77b3dd4c4586> in <module>() 8 sample_every=200, 9 save_every=500, ---> 10 overwrite=True 11 )

/usr/lib/python3.6/shutil.py in copyfile(src, dst, follow_symlinks) 118 os.symlink(os.readlink(src), dst) 119 else: --> 120 with open(src, 'rb') as fsrc: 121 with open(dst, 'wb') as fdst: 122 copyfileobj(fsrc, fdst)

FileNotFoundError: [Errno 2] No such file or directory: 'models/124M/hparams.json'

cyberosa avatar Feb 18 '21 19:02 cyberosa

You need to download the GPT-2 model first via download_gpt2()

I think the answer is right there for you..

zacc avatar Feb 19 '21 10:02 zacc

Thanks zacc. Question then is.... the model should not be loaded after doing ?

gpt2.load_gpt2(sess, run_name='model-3.0')

My worry with calling to:

def download_gpt2(model_dir='models', model_name='124M')

is that you need to give the model_name and that is going to download the pretrained model from Google Cloud, but I don´t want to use the pretrained model but the finetuned model that I saved at my googledrive with the checkpoint. Maybe both actions are compatible and my fear is for nothing. Do you mean that I should call first:

gpt2.download_gpt2(model_name='124M')

and afterwards call

gpt2.load_gpt2(sess, run_name='model-3.0')

and that is fine?

cyberosa avatar Feb 19 '21 11:02 cyberosa

Yes, you still need to download_gpt2 even though you are training your saved model.

zacc avatar Feb 20 '21 02:02 zacc