leejason issues

Results 13 issues of


                                            leejason

RuntimeError: Attempted to use a closed Session.

Thank you for a nice example, but I bumped into the following error when executing "estimator.train(input_fn=train_input_fn, max_steps=num_train_steps)" in run_classifier_multi_labels_bert.py. RuntimeError: Attempted to use a closed Session. Any suggestion?

[Discussion] GPT-3

Thank you for great work. The Appendix B of the [GPT-3 paper](https://arxiv.org/abs/2005.14165) mentions the following. I'm wondering whether the idea has been implemented in gpt2-ml. If not yet, what would...

Is it possible to train with variable length and padding?

Thank you for great work. Is it possible to train with variable length and padding like the following? > One last detail. GPT2 was pre-trained by OpenAI on large spans...

Is it possible to run with TPU on Colab?

It would be great if TPU is possible.

training from scratch

Will the code for training from scratch be released after the 1.5B model?

GPT-2

Sorry about a newbie question. If I'd like to integrate OpenAI GPT-2 for autocompletion, which source code should I try?

CausalTransformerV2 or CausalTransformer?

Is the pretraining of GPT-J-6B based on CausalTransformerV2 or simply CausalTransformer? Why? Thanks for any advice.

Can "slim_model.py" work with "d_model" as 768?

I updated "6B_roto_256.json" with the following for trying a smaller model. > "d_model": 768 The pretraining works on one TPU v3-8, but the slimmed model after using "slim_model.py" produces gibberish...

Is "to_hf_weights.py" specific to "6B_roto_256.json" only?

Is "to_hf_weights.py" specific to "6B_roto_256.json" only? I was trying to make this codebase work for smaller models (e.g., "layers": 12, "d_model": 768, "n_heads": 16). However, the HF model produced by...

save_config_to_hf_format()

For making "to_hf_weights.py" work correctly, do I have to modify the following if I have my own tokenizer trained with vocab_size=50400? Or, can I assume that "GPT2Tokenizer" does not matter...