Pierre Colombo comments

Repositories
Issues
Comments

Results 15 comments of


                                            Pierre Colombo

trafficstars

Bug in the batch size vs accumulation step

Same with this config: `base_model: toto/toto model_type: MistralForCausalLM tokenizer_type: LlamaTokenizer is_mistral_derived_model: true resize_token_embeddings_to_32x: true datasets: - path: data_processed/toto.jsonl ds_type: json type: sharegpt conversation: mistral load_in_8bit: false load_in_4bit: false strict: false...

Bug in the batch size vs accumulation step

either i'm not doing something correctly or there is something off 👍 in the way accumulation and micro batch size influences the losses ? Is this related: https://discuss.huggingface.co/t/gradient-accumulation-gives-different-results-compared-to-full-batch/65889 ?

Pierre Colombo

Bug in the batch size vs accumulation step

Bug in the batch size vs accumulation step

Bug in the batch size vs accumulation step

Add ANLI to Full Benchmark

Add MNLI to Full Benchmark