tim-a-davis comments

Repositories
Issues
Comments

Results 5 comments of


                                            tim-a-davis

Optimum-NVIDIA

+1 Adding this in as an option seems like a no brainer. The optimum-nvidia team has said they plan to support more models soon.

Optimum-NVIDIA

Hey @Narsil thanks for the reply. As far as throughput goes though, on the huggingface blog, they are claiming to reach speeds of 1200 tokens/second on 7-billion parameter models. I...

MPT support

Yes I am also interested in getting support for MPT models. I would love to assist in any way I can.

Support for mosaicml/mpt-30b-instruct model

it's very slow. This model is not supported for sharding at the moment in text-generation-inference. > did you try --trust-remote-code while running the docker

Support for mosaicml/mpt-30b-instruct model

> Then try implementing a rudimentary implementation of it, you can use rust or js as router and Python for inference, copy the custom kernels from the repo, modify them...