Pankaj Gupta comments

Results 38 comments of


                                            Pankaj Gupta

Summingbird doesn't pick correct option when same setting defined on multiple nodes

@johnynek curios what alternatives to options you considered. I don't think many of our users grok the way application of settings via naming flows through the graph, so I agree...

Any processes started in the load function of Truss are not available for predict

One solution could be to keep the model load thread around for the lifetime of the inference server process. The thread should exit to allow for graceful termination. This can...

fp32 LoRA support for Llama

Thanks, great to know that the engine supports fp32 LoRA. The model is indeed private, let me provide details shortly in the oss alternative.

fp32 LoRA support for Llama

Would appreciate any updates on this issue. thx

fp32 LoRA support for Llama

Thanks for the update.

BatchPrefillWithRaggedKVCacheWrapper custom_mask doesn't work for small batch sizes when head_dim_qk != head_dim_vo

I think the real issue seems to be that once custom mask is set for the wrapper, it gets used even if there's a subsequent call without it. It seems...

Local docker runs should honor resource specifications in config

We'll need to pass the right settings in https://github.com/basetenlabs/truss/blob/main/truss/truss_handle.py#L194 https://docs.docker.com/config/containers/resource_constraints/

Local docker runs should honor resource specifications in config

> What's the expected behavior around the `use_gpu` flag and the new `accelerator` spec? Good point, this needs more thought. It's unlikely that someone would have the same accelerator locally...