Suraj Subramanian

Results 44 comments of Suraj Subramanian

We can't diagnose the issue without more information about the platform you're running on and the full error you encounter. Also, please use the issue template in the future as...

I'm not sure what the error is, please paste the full stacktrace. If you made any modifications to the script, include the changes you made. Also, please adhere to the...

Sounds interesting, but I'm not sure if embeddings can be meaningfully visualized like this. Perhaps approaches like t-sne/umap might provide more insight? cc @melanierk

CUDA supports float16 which is more efficient. See L:118 where this is set as the default dtype. You can comment that out to load the model as bf16 if you'd...

Thanks @aakashapoorv I think it might be better to add the asserts in the `build` function instead of in the example scripts. https://github.com/meta-llama/llama3/blob/cc44ca2e1c269f0e56e6926d7f4837c983c060dc/llama/generation.py#L37

Hi! The example scripts in this repo are for running inference on single (for 8B) and multi (for 70B) GPU setups using CUDA, but Windows is not currently supported. You...

Yes, you can use AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3-8B-Instruct)

How many GPUs are you using? the 70B model will need 8GPUs to run from this repo. If you have less than 8 GPUs, please use the model from HF