Barun Patra
Barun Patra
Hi @fding Would it be possible to share some of the tensorboard logs for the Byte level LM pretraining and/or specifics on what the final MLM loss the models converge...
I had trained the models using the theano backend, so not sure about the TensorFlow backend. That said, I hope you changed line 22 of LUNA_unet.py K.set_image_dim_ordering('th') # Theano dimension...
On a K40 GPU. Make sure your theano configurations are correct. See Install and configure GPU drivers http://deeplearning.net/software/theano/install_ubuntu.html
https://github.com/Theano/Theano/issues/6507
https://nlp.stanford.edu/projects/snli/ The link to the data set
The model expects a Lang class object. Essentially, it a two way dictionary between words integer ix. You can take a look at the Lang.py file for constructing one. The...
Concretely, consider the Qwen 7B model ```python from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-chat", trust_remote_code=True) assert tokenizer.bos_token_id is None # This is true ``` The presence of a BOS token...
Thank you ! I meant that the sequence length aspect is finite (similar to vocab size) ?