Barun Patra comments

Results 8 comments of


                                            Barun Patra

trafficstars

Training Scripts masked LM

Hi @fding Would it be possible to share some of the tensorboard logs for the Byte level LM pretraining and/or specifics on what the final MLM loss the models converge...

I got something wrong with class "Accuracy"

I had trained the models using the theano backend, so not sure about the TensorFlow backend. That said, I hope you changed line 22 of LUNA_unet.py K.set_image_dim_ordering('th') # Theano dimension...

I got something wrong with class "Accuracy"

On a K40 GPU. Make sure your theano configurations are correct. See Install and configure GPU drivers http://deeplearning.net/software/theano/install_ubuntu.html

I got something wrong with class "Accuracy"

https://github.com/Theano/Theano/issues/6507

what's the format of dataset?

https://nlp.stanford.edu/projects/snli/ The link to the data set

AttributeError: 'Lang' object has no attribute 'ix2word'

The model expects a Lang class object. Essentially, it a two way dictionary between words integer ix. You can take a look at the Lang.py file for constructing one. The...

Tokenizer bos token problem when using KTO trainer

Concretely, consider the Qwen 7B model ```python from transformers import AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen-7B-chat", trust_remote_code=True) assert tokenizer.bos_token_id is None # This is true ``` The presence of a BOS token...

Positional Embeddings should be MuReadout parameters ?

Thank you ! I meant that the sequence length aspect is finite (similar to vocab size) ?