Anindyadeep

Results 69 comments of Anindyadeep
trafficstars

Seems like HuggingFace does have an implementation for both Llama implementation: https://github.com/huggingface/transformers/pull/24587 Mistral implementation: https://github.com/huggingface/transformers/pull/26943 we can come back to this, once done with the initial ones

Hey @nsosio, so just clarrifying here, for PyTorch (#21), is is simply, using the HF-pytorch `.bin` file for llama-2 7B on `fp-16/32` precision. Where as for gpt-fast, it is this...

Hey thanks for the reply, So the process was: 1. Use a huggingface model 2. Then use litgpt `litgpt convert to_litgpt --checkpoint_dir` command to convert to a litgpt format.

> Ok, but what the dtype of the HuggingFace model? If it's already in a quantized form (`torch.uint8`), then it might explain the error. You can provide a link to...

I see, got it, let me try this out, and will keep posted in this thread, thanks for the headsup

> Correct. In order to use quantization you just need weights in a standard precision (`fp32`, `fp16`, `bf16`). When the model is loaded and quantization is specified (e.g. `bnb.nf4`), the...

> But have you checked the dtype of the weights in `./models/Llama-2-7b-chat-hf/`? you mean the weights for litgpt model or the hf model? Also as far as the hf models...

Here is the HF config ```json { "_name_or_path": "meta-llama/Llama-2-7b-chat-hf", "architectures": [ "LlamaForCausalLM" ], "bos_token_id": 1, "eos_token_id": 2, "hidden_act": "silu", "hidden_size": 4096, "initializer_range": 0.02, "intermediate_size": 11008, "max_position_embeddings": 4096, "model_type": "llama", "num_attention_heads":...