pythia issues

Pythia 12b flash config

The pythia 12b config has: `"attention-config": [[["flash"], 40]],` However, in the gpt-neox repo the 40 is replaced by 36, and in the file: [https://huggingface.co/EleutherAI/neox-ckpt-pythia-12b-v1/blob/main/12B.yml](https://huggingface.co/EleutherAI/neox-ckpt-pythia-12b-v1/blob/main/12B.yml) This value of 36. Is this...

jvendrow

how to use Pythia

1

I am trying to use Pythia and do according to quickstart as follow. However I got error. It seems model and tokenizer file can not found. How to deal with...

gaohang

Convert to GGUF

1

Can you please convert this to gguf? I tried to use llama.cpp convert.py with the following command: ``` python convert.py pythia-12b/ --outfile pythia-12b/pythia-12b-f16.gguf --outtype f16 ``` It gives me this...

yanxon

Questions regarding the WSC evaluation results

Hi, I'm recently trying to run lm-eval on Pythia models using the benchmarks listed in the paper. All the benchmarks show similar results to those reported in the paper, except...

mutiann

Update README.md

1

just to feel sth

tinycrops

fix dead links in README.md

1

Fixed dead links for LLM360 papers. I confirmed that they're the same papers here: [Amber on arXiv](https://arxiv.org/abs/2312.06550) - [Amber dead link on web.archive.org](https://web.archive.org/web/20231217021206/https://www.llm360.ai/paper.pdf) [K2 on arXiv](https://arxiv.org/abs/2501.07124) - [K2 dead link...

KarolisRam

PolyPythias: Include scripts for finding HMM training maps

- `extract_metrics.py`: collect the parameter statistics for inducing the HMM training maps. - `training_map.py`: find best training maps + visualize Markov chains.

oskarvanderwal

Shard hashes for `EleutherAI/pile-deduped-pythia-preshuffled`

Hi there, I was wondering whether the shard hashes for the `EleutherAI/pile-deduped-pythia-preshuffled` are available. Best, Pietro

pietrolesci

Pythia 160M is giving unreasonable logit values

1

``` from transformers import AutoModelForCausalLM, AutoTokenizer import torch model = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-160m") tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-160m") input_ids = tokenizer.encode("Hello, my dog is cute", return_tensors="pt") model.eval() with torch.no_grad(): logits = model(input_ids).logits print(logits) print(torch.topk(logits,...

danielmisrael

`torch.concat` is supported when reproducing results with docker

Many thanks for the kind sharing!! When reproducing training results, the docker file specifies `torch==1.8.1`. However, torch at 1.8.1 had not introduced `torch.concat` yet, which will cause errors on this...

pingzhili

pythia
pythia copied to clipboard

Metadata

Pythia 12b flash config

how to use Pythia

Convert to GGUF

Questions regarding the WSC evaluation results

Update README.md

fix dead links in README.md

PolyPythias: Include scripts for finding HMM training maps

Shard hashes for `EleutherAI/pile-deduped-pythia-preshuffled`

Pythia 160M is giving unreasonable logit values

`torch.concat` is supported when reproducing results with docker

← Metadata

Owner

Metadata

pythia pythia copied to clipboard

Metadata

← Metadata

Owner

Metadata

pythia
pythia copied to clipboard