mrcabbage972 issues

Results 46 issues of


                                            mrcabbage972

Adding support for 8-bit training with bitsandbytes

Adding bitsandbytes dependency to requirements.txt. Using the currently unused quantization option in the config file to set whether to use the BNB 8-bit optimizer. Details on using the 8-bit optimizer...

update team page

Tokenize the StarCoder dataset

Get all relevant data for StarCoder into LUMI

80B tokens each of language day from vi, en, fi, hi, ja. Some of these langs we won't have enough data so we will need to do multiple passes on...

Do a small test run

Set up the training configuration

Train 2nd batch of expert models

Train 3B experts again, with the following variations: data size: 500K, 1M, 10M, and if enough examples 50M (basically max data) 1. 80% pile/20% expert 2. Expert only Both should...

Investigate Expert Models Having High Perplexity

Our analysis in #53 has shown that the expert models we had previously trained actually have a higher perplexity than the base model. Here are some issues that may have...

Evaluation

Create template for HF dataset config

We need a way to create a [config file](https://huggingface.co/datasets/Multi-Domain-Expert-Layers/arxiv/blob/main/arxiv.py) for each dataset that is being uploaded via the upload script, so that the trainer will track the metrics split by...

Trainer

Evaluate a merged expert model's perplexity

The goal is to do a [perplexity](https://huggingface.co/docs/transformers/perplexity) calculation on a few models: 1. A model that is a weighted average of a few experts models 2. A baseline model which...

Evaluation