Barun Patra

Results 8 comments of Barun Patra
trafficstars

Hi @fding Would it be possible to share some of the tensorboard logs for the Byte level LM pretraining and/or specifics on what the final MLM loss the models converge...

I had trained the models using the theano backend, so not sure about the TensorFlow backend. That said, I hope you changed line 22 of LUNA_unet.py K.set_image_dim_ordering('th') # Theano dimension...

On a K40 GPU. Make sure your theano configurations are correct. See Install and configure GPU drivers http://deeplearning.net/software/theano/install_ubuntu.html

https://github.com/Theano/Theano/issues/6507

https://nlp.stanford.edu/projects/snli/ The link to the data set

The model expects a Lang class object. Essentially, it a two way dictionary between words integer ix. You can take a look at the Lang.py file for constructing one. The...

Concretely, consider the Qwen 7B model ```python from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-chat", trust_remote_code=True) assert tokenizer.bos_token_id is None # This is true ``` The presence of a BOS token...

Thank you ! I meant that the sequence length aspect is finite (similar to vocab size) ?