mrcabbage972

Results 46 issues of mrcabbage972

Adding bitsandbytes dependency to requirements.txt. Using the currently unused quantization option in the config file to set whether to use the BNB 8-bit optimizer. Details on using the 8-bit optimizer...

ml

80B tokens each of language day from vi, en, fi, hi, ja. Some of these langs we won't have enough data so we will need to do multiple passes on...

Train 3B experts again, with the following variations: data size: 500K, 1M, 10M, and if enough examples 50M (basically max data) 1. 80% pile/20% expert 2. Expert only Both should...

Our analysis in #53 has shown that the expert models we had previously trained actually have a higher perplexity than the base model. Here are some issues that may have...

Evaluation

We need a way to create a [config file](https://huggingface.co/datasets/Multi-Domain-Expert-Layers/arxiv/blob/main/arxiv.py) for each dataset that is being uploaded via the upload script, so that the trainer will track the metrics split by...

Trainer

The goal is to do a [perplexity](https://huggingface.co/docs/transformers/perplexity) calculation on a few models: 1. A model that is a weighted average of a few experts models 2. A baseline model which...

Evaluation