Suraj Subramanian comments

Results 44 comments of


                                            Suraj Subramanian

Error contacting the Llama API: 500 Server Error

This is not an API Meta Llama offers. Please reach out to https://www.llama-api.com/ for this issue

Does 70B model need to use 8 gpu cards?

Yes, running the 70B needs 8 GPUs as it has 8 shards. You can run it on a different number of GPUs via huggingface.

Does 70B model need to use 8 gpu cards?

https://github.com/meta-llama/llama3?tab=readme-ov-file#access-to-hugging-face

How to speed up Llama3-70B inference?

Hi, you could try using `torch.compile(mode='reduce-overhead')` to speed up inference with CUDA graphs. We have some examples using VLLM here: https://github.com/meta-llama/llama-recipes

The model is consistently modifying my numeric input

Looks like you're using the quantized models, it might be hampering the model's performance on numerical data. I cannot replicate this issue on the official meta llama models, I get...

Optimizer state is not synchronized across replicas like model state is

I agree with you. DDP does not explicitly do anything to enforce any synchronization for optimizers, and the identical-ness is only because the same states are sent to each process....

chat completion issues

Hi, I'm not sure what your question is. Can you share minimal code snippets so we can better understand your query?

Clarification on prompt format?

Both are fine, in the first one you're letting the LLM determine what the first output token should be, whereas in the second one you are enforcing the first output...

Fixes end_of_header token name in comments

Thanks - although this isn't a critical change, it can help improve readability. The correct token is `end_header_id`, if you can update the PR i'll merge it

The client socket has failed to connect to [Maxim]:12355 (system error: 10049 - The requested address is not valid in its context.).

The error is probably related to the init_method arg you have passed... why are you passing that in? Ensure your machine has 8 GPUs as that is a requirement for...