pythia icon indicating copy to clipboard operation
pythia copied to clipboard

Results 35 pythia issues
Sort by recently updated
recently updated
newest added

The pythia 12b config has: `"attention-config": [[["flash"], 40]],` However, in the gpt-neox repo the 40 is replaced by 36, and in the file: [https://huggingface.co/EleutherAI/neox-ckpt-pythia-12b-v1/blob/main/12B.yml](https://huggingface.co/EleutherAI/neox-ckpt-pythia-12b-v1/blob/main/12B.yml) This value of 36. Is this...

I am trying to use Pythia and do according to quickstart as follow. However I got error. It seems model and tokenizer file can not found. How to deal with...

Can you please convert this to gguf? I tried to use llama.cpp convert.py with the following command: ``` python convert.py pythia-12b/ --outfile pythia-12b/pythia-12b-f16.gguf --outtype f16 ``` It gives me this...

Hi, I'm recently trying to run lm-eval on Pythia models using the benchmarks listed in the paper. All the benchmarks show similar results to those reported in the paper, except...

Fixed dead links for LLM360 papers. I confirmed that they're the same papers here: [Amber on arXiv](https://arxiv.org/abs/2312.06550) - [Amber dead link on web.archive.org](https://web.archive.org/web/20231217021206/https://www.llm360.ai/paper.pdf) [K2 on arXiv](https://arxiv.org/abs/2501.07124) - [K2 dead link...

- `extract_metrics.py`: collect the parameter statistics for inducing the HMM training maps. - `training_map.py`: find best training maps + visualize Markov chains.

Hi there, I was wondering whether the shard hashes for the `EleutherAI/pile-deduped-pythia-preshuffled` are available. Best, Pietro

``` from transformers import AutoModelForCausalLM, AutoTokenizer import torch model = AutoModelForCausalLM.from_pretrained("EleutherAI/pythia-160m") tokenizer = AutoTokenizer.from_pretrained("EleutherAI/pythia-160m") input_ids = tokenizer.encode("Hello, my dog is cute", return_tensors="pt") model.eval() with torch.no_grad(): logits = model(input_ids).logits print(logits) print(torch.topk(logits,...

Many thanks for the kind sharing!! When reproducing training results, the docker file specifies `torch==1.8.1`. However, torch at 1.8.1 had not introduced `torch.concat` yet, which will cause errors on this...