pythia
pythia copied to clipboard
The pythia 12b config has: `"attention-config": [[["flash"], 40]],` However, in the gpt-neox repo the 40 is replaced by 36, and in the file: [https://huggingface.co/EleutherAI/neox-ckpt-pythia-12b-v1/blob/main/12B.yml](https://huggingface.co/EleutherAI/neox-ckpt-pythia-12b-v1/blob/main/12B.yml) This value of 36. Is this...
I am trying to use Pythia and do according to quickstart as follow. However I got error. It seems model and tokenizer file can not found. How to deal with...
Can you please convert this to gguf? I tried to use llama.cpp convert.py with the following command: ``` python convert.py pythia-12b/ --outfile pythia-12b/pythia-12b-f16.gguf --outtype f16 ``` It gives me this...
Hi, I'm recently trying to run lm-eval on Pythia models using the benchmarks listed in the paper. All the benchmarks show similar results to those reported in the paper, except...
just to feel sth
Fixed dead links for LLM360 papers. I confirmed that they're the same papers here: [Amber on arXiv](https://arxiv.org/abs/2312.06550) - [Amber dead link on web.archive.org](https://web.archive.org/web/20231217021206/https://www.llm360.ai/paper.pdf) [K2 on arXiv](https://arxiv.org/abs/2501.07124) - [K2 dead link...
- `extract_metrics.py`: collect the parameter statistics for inducing the HMM training maps. - `training_map.py`: find best training maps + visualize Markov chains.
Hi there, I was wondering whether the shard hashes for the `EleutherAI/pile-deduped-pythia-preshuffled` are available. Best, Pietro
``` from transformers import AutoModelForCausalLM, AutoTokenizer import torch model = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-160m") tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-160m") input_ids = tokenizer.encode("Hello, my dog is cute", return_tensors="pt") model.eval() with torch.no_grad(): logits = model(input_ids).logits print(logits) print(torch.topk(logits,...
Many thanks for the kind sharing!! When reproducing training results, the docker file specifies `torch==1.8.1`. However, torch at 1.8.1 had not introduced `torch.concat` yet, which will cause errors on this...