Suraj Subramanian
Suraj Subramanian
We can't diagnose the issue without more information about the platform you're running on and the full error you encounter. Also, please use the issue template in the future as...
I'm not sure what the error is, please paste the full stacktrace. If you made any modifications to the script, include the changes you made. Also, please adhere to the...
Sounds interesting, but I'm not sure if embeddings can be meaningfully visualized like this. Perhaps approaches like t-sne/umap might provide more insight? cc @melanierk
Please share the full stacktrace which contains the actual error.
CUDA supports float16 which is more efficient. See L:118 where this is set as the default dtype. You can comment that out to load the model as bf16 if you'd...
Thanks @aakashapoorv I think it might be better to add the asserts in the `build` function instead of in the example scripts. https://github.com/meta-llama/llama3/blob/cc44ca2e1c269f0e56e6926d7f4837c983c060dc/llama/generation.py#L37
Hi! The example scripts in this repo are for running inference on single (for 8B) and multi (for 70B) GPU setups using CUDA, but Windows is not currently supported. You...
Thanks for your contribution @pchng!
Yes, you can use AutoTokenizer.from_pretrained('meta-llama/Meta-Llama-3-8B-Instruct)
How many GPUs are you using? the 70B model will need 8GPUs to run from this repo. If you have less than 8 GPUs, please use the model from HF