Ditto P S comments

Results 55 comments of


                                            Ditto P S

ChatML template issue with Llama-2-7b-chat-hf

It's happening because of the GrammerlessEngine. The chat template is restricted in GrammerlessEngine. Is there any specific accuracy issue If I extend the class to remove that criterion?

ChatML template issue with Llama-2-7b-chat-hf

Thanks for the detailed response. I have tried this hack, but now I'm getting the below error. ``` id: token for token, id in tokenizer.get_added_vocab().items() ^^^^^^^^^^^^^^^^^^^^^^^^^ AttributeError: 'GrammarlessTokenizer' object has...

ChatML template issue with Llama-2-7b-chat-hf

Thank you for the update, I will go with that for now.

Input template for Transformers vision language models ?

Same here, I got the issue while using "microsoft/Phi-3-medium-4k-instruct"

ModelAdapters do not dynamically route to new pods

I'm facing the same issue with v3.0.0. The controller is not triggering the adapter load for the new pods.

GPU optimiser replicas not scaling

The optimization calculation happened for some time, and then it started skipping with logging insufficient data. After that, the replica count went back to 0. Please see the logs below....

GPU optimiser replicas not scaling

Yes, you are right, I have 2 deployments in the cluster, and I'm testing only llama for the scaling. This is the current log, where both models are not skipping...

GPU optimiser replicas not scaling

$inf is for the deployment bud-qwen2-47f5298d, which doesn't have a profile. But if you look at llama-3-1-8b-instruct, it is skipping optimization. Is there a doc on how the optimizer works?...

[Misc] Support adapter scaling to all replicas

Have used the following logic for Adapter loading/unloading 1. The pods are selected with Label-Based Matching - adapter.model.aibrix.ai/enabled: "true" 2. The selected pods are added to the Status.Instances list 3....

[Misc] Support adapter scaling to all replicas

scheulePod was used to pick one pod and then assigned to instance.Status.Instances. Instead of choosing one pod, the new approach uses ALL pods that match the selector and adds all...