Has anyone managed to work it on Windows? Which OS did you use to make it work?
I have windows 10, x64, Core i7 2600 K CPU, 32 ram memory, GTX 1050 Ti GPU
I have installed latest Phyton and Tensorflow
Also run these commands
1) pip3 install tensorflow-gpu regex
2) pip3 install requests tqdm
3) cd GPT2 folder (cloned via bash)
4) python download_model.py PrettyBig
Everything I believe is ready however i am not able to make it work
Here my configurations and what errors I am getting
Main folder

PrettyBig folder

PrettyBig.json - file paths are correct and working

Here the command line I have used
C:\GPT2>python main.py --model PrettyBig.json --predict_text "Pikachu"
At first it runs several minutes with around 70% CPU usage and above 2 GB ram usage
Here the full command line result of the above command
C:\GPT2>python main.py --model PrettyBig.json --predict_text "Pikachu" {'n_head': 16, 'encoder_path': 'C:\GPT2\encoder', 'n_vocab': 50257, 'embed_dropout': 0.0, 'lr': 0.00025, 'warmup_steps': 2000, 'weight_decay': 0.01, 'beta1': 0.9, 'beta2': 0.98, 'epsilon': 1e-09, 'opt_name': 'adam', 'train_batch_size': 256, 'attn_dropout': 0.0, 'train_steps': 10000, 'eval_steps': 10, 'max_steps': 604800, 'data_path': 'gs://connors-datasets/openwebtext/', 'scale': 0.14433756729740646, 'res_dropout': 0.1, 'predict_batch_size': 1, 'eval_batch_size': 256, 'iterations': 100, 'n_embd': 1024, 'input': 'openwebtext_longbiased', 'model': 'GPT2', 'model_path': 'C:\GPT2\PrettyBig', 'n_ctx': 1024, 'predict_path': 'logs/predictions_SortaBig.txt', 'n_layer': 25, 'use_tpu': False, 'precision': 'float32'} Using config: {'_model_dir': 'C:\GPT2\PrettyBig', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': , '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_train_distribute': None, '_device_fn': None, '_protocol': None, '_eval_distribute': None, '_experimental_distribute': None, '_service': None, '_cluster_spec': <tensorflow.python.training.server_lib.ClusterSpec object at 0x0000016DD33ECEB8>, '_task_type': 'worker', '_task_id': 0, '_global_id_in_cluster': 0, '_master': '', '_evaluation_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1} Generating predictions... From C:\Python37\lib\site-packages\tensorflow\python\framework\op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version. Instructions for updating: Colocations handled automatically by placer. Calling model_fn. From C:\GPT2\models\gpt2\sample.py:57: to_float (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.cast instead. From C:\GPT2\models\gpt2\sample.py:59: multinomial (from tensorflow.python.ops.random_ops) is deprecated and will be removed in a future version. Instructions for updating: Use tf.random.categorical instead. Done calling model_fn. Graph was finalized. From C:\Python37\lib\site-packages\tensorflow\python\training\saver.py:1266: checkpoint_exists (from tensorflow.python.training.checkpoint_management) is deprecated and will be removed in a future version. Instructions for updating: Use standard file APIs to check for files with this prefix. Restoring parameters from C:\GPT2\PrettyBig\model.ckpt Running local_init_op. Done running local_init_op. Traceback (most recent call last): File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 1334, in _do_call return fn(*args) File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 1319, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 1407, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,0] = 1024 is not in [0, 1024) [[{{node sample_sequence/while/model/GatherV2_1}}]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "main.py", line 131, in
predict_fn(network, text, params) File "C:\GPT2\predict_fns.py", line 18, in gpt2_predict for i, p in enumerate(predictions): File "C:\Python37\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 629, in predict preds_evaluated = mon_sess.run(predictions) File "C:\Python37\lib\site-packages\tensorflow\python\training\monitored_session.py", line 676, in run run_metadata=run_metadata) File "C:\Python37\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1171, in run run_metadata=run_metadata) File "C:\Python37\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1270, in run raise six.reraise(*original_exc_info) File "C:\Python37\lib\site-packages\six.py", line 693, in reraise raise value File "C:\Python37\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1255, in run return self._sess.run(*args, **kwargs) File "C:\Python37\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1327, in run run_metadata=run_metadata) File "C:\Python37\lib\site-packages\tensorflow\python\training\monitored_session.py", line 1091, in run return self._sess.run(*args, **kwargs) File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 929, in run run_metadata_ptr) File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 1152, in _run feed_dict_tensor, options, run_metadata) File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 1328, in _do_run run_metadata) File "C:\Python37\lib\site-packages\tensorflow\python\client\session.py", line 1348, in _do_call raise type(e)(node_def, op, message) tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,0] = 1024 is not in [0, 1024) [[node sample_sequence/while/model/GatherV2_1 (defined at C:\GPT2\models\gpt2\gpt2.py:208) ]] Caused by op 'sample_sequence/while/model/GatherV2_1', defined at: File "main.py", line 131, in
predict_fn(network, text, params) File "C:\GPT2\predict_fns.py", line 18, in gpt2_predict for i, p in enumerate(predictions): File "C:\Python37\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 611, in predict features, None, model_fn_lib.ModeKeys.PREDICT, self.config) File "C:\Python37\lib\site-packages\tensorflow_estimator\python\estimator\estimator.py", line 1112, in _call_model_fn model_fn_results = self._model_fn(features=features, **kwargs) File "C:\GPT2\model_fns.py", line 62, in gpt2_model temperature=1.0, top_k=params["top_k"] File "C:\GPT2\models\gpt2\sample.py", line 82, in sample_sequence back_prop=False, File "C:\Python37\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3556, in while_loop return_same_structure) File "C:\Python37\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3087, in BuildLoop pred, body, original_loop_vars, loop_vars, shape_invariants) File "C:\Python37\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3022, in _BuildLoop body_result = body(*packed_vars_for_body) File "C:\Python37\lib\site-packages\tensorflow\python\ops\control_flow_ops.py", line 3525, in body = lambda i, lv: (i + 1, orig_body(*lv)) File "C:\GPT2\models\gpt2\sample.py", line 56, in body next_outputs = step(params, prev[:, tf.newaxis], past=past) File "C:\GPT2\models\gpt2\sample.py", line 40, in step lm_output = lm_output = gpt2.model(params=params, X=tokens, past=past, reuse=tf.AUTO_REUSE) File "C:\GPT2\models\gpt2\gpt2.py", line 208, in model h = tf.gather(wte, X) + tf.gather(wpe, positions_for(X, past_length)) File "C:\Python37\lib\site-packages\tensorflow\python\util\dispatch.py", line 180, in wrapper return target(*args, **kwargs) File "C:\Python37\lib\site-packages\tensorflow\python\ops\array_ops.py", line 3273, in gather return gen_array_ops.gather_v2(params, indices, axis, name=name) File "C:\Python37\lib\site-packages\tensorflow\python\ops\gen_array_ops.py", line 4390, in gather_v2 "GatherV2", params=params, indices=indices, axis=axis, name=name) File "C:\Python37\lib\site-packages\tensorflow\python\framework\op_def_library.py", line 788, in _apply_op_helper op_def=op_def) File "C:\Python37\lib\site-packages\tensorflow\python\util\deprecation.py", line 507, in new_func return func(*args, **kwargs) File "C:\Python37\lib\site-packages\tensorflow\python\framework\ops.py", line 3300, in create_op op_def=op_def) File "C:\Python37\lib\site-packages\tensorflow\python\framework\ops.py", line 1801, in init self._traceback = tf_stack.extract_stack() InvalidArgumentError (see above for traceback): indices[0,0] = 1024 is not in [0, 1024) [[node sample_sequence/while/model/GatherV2_1 (defined at C:\GPT2\models\gpt2\gpt2.py:208) ]]
I have used single text as author suggested but still fails
I have also tested input.txt method
This is definitely strange and I will have to investigate it more carefully. I'm afraid I don't currently know a solution.
This is definitely strange and I will have to investigate it more carefully. I'm afraid I don't currently know a solution.
thanks for the reply
Your trained model works in official GPT2 repository clone though >https://github.com/openai/gpt-2
I have cloned official repository and put your files there, and it works in their setup