pythia issues

tokenizer.pad_token

1

Hello team, When I do: ``` from transformers import AutoTokenizer pretrained_model = "EleutherAI/pythia-160m" tokenizer = AutoTokenizer.from_pretrained( pretrained_model, padding_side="left", cache_dir=pretrained_model+'_tokenizer', ) print(tokenizer.pad_token) ``` It seems like the `pad_token` is empty (`None`...

vincent317

instruct-tuned pythia

Do we have instruct-tuned versions of Pythia models? So that we can do conversational inference.

WilliamsToTo

Inconsistent init methods of pythia-6.9b model

3

Hi, I found that the init method of parameters in pythia-6.9B model is inconsistent with the standard deviation of the [step0 checkpoint](https://huggingface.co/EleutherAI/pythia-6.9b/tree/step0). Table 6 in the [paper](https://arxiv.org/abs/2304.01373) shows that init-method...

mqyqlx

Provide the shuffled index_mapping npy files for ease of reproducing training data

1

Hi, I was wondering can you provide the index_mapping files that is generated by the [GPT2Dataset](https://github.com/EleutherAI/gpt-neox/blob/03186decef022dc35e6adee1a66619968812e0a9/megatron/data/gpt2_dataset.py#L29)? From the construction of gpt2dataset at [here](https://github.com/EleutherAI/gpt-neox/blob/03186decef022dc35e6adee1a66619968812e0a9/megatron/data/gpt2_dataset.py#L158), I can see there are three `npy`...

ziqi-zhang

Optimizer states in HF format

1

Hi folks -- thanks for the great work on this. I've been doing some fine-tuning experiments off the huggingface checkpoints and was wondering whether anyone has converted the neox optimizer...

seyuboglu

Weird inconsistency in Tokenizer vocabulary

1

Hello everyone! I found a weird inconsistency in the tokenizer vocabulary. I wanted to ask why this could be happening. I have loaded a tokenizer from HF: ``` tokenizer =...

javirandor

Add training loss data

1

Task description: "Collect all loss values into CSV files from WandB and -- if needed -- log files". The most important file is `pythia_runs.tsv` in which I manually collect the...

pietrolesci

"gas" configuration doesn't do anything

Per [this](https://github.com/EleutherAI/gpt-neox/pull/1144), my understanding is that the ```gas``` config in neox doesn't do anything, and shouldn't be used, and should be removed. We should be using ```gradient_accumulation_steps``` instead. It [appears](https://github.com/search?q=repo%3AEleutherAI%2Fpythia+gas&type=code&p=1)...

segyges

Reading data is slowly！

1

I followed readme： ``` git lfs clone https://huggingface.co/datasets/EleutherAI/pythia_deduped_pile_idxmaps python utils/unshard_memmap.py --input_file ./pythia_deduped_pile_idxmaps/pile_0.87_deduped_text_document-00000-of-00082.bin --num_shards 83 --output_dir ./pythia_pile_idxmaps/ ``` I got a 600+G file, and then I used gpt-neox's dataloader to read...

Lisennlp

Update README.md

1

fixed invalid filename.

speed1313

pythia
pythia copied to clipboard

Metadata

tokenizer.pad_token

instruct-tuned pythia

Inconsistent init methods of pythia-6.9b model

Provide the shuffled index_mapping npy files for ease of reproducing training data

Optimizer states in HF format

Weird inconsistency in Tokenizer vocabulary

Add training loss data

"gas" configuration doesn't do anything

Reading data is slowly！

Update README.md

← Metadata

Owner

Metadata

pythia pythia copied to clipboard

Metadata

← Metadata

Owner

Metadata

pythia
pythia copied to clipboard