DeepSpeed-MII
DeepSpeed-MII copied to clipboard
I wonder if we can use batch inference and offload in mii pipeline ?
Since in the principle of the latest fastgen in mii includes Continuous Batching, We should have choice to adjust the batch size in mii pipeline But haven't found it in the document and from the code it's still hard to to find the parameters of it For they are included in a dict ends with config. So could someone please help to tell me which what should we use to call batch inference? Or if and why it hasn't been deployed. And also I haven't found offload in mii pipeline , However from DeepSpeed, it must can be found in there, They from another issue and the answers we can see that we can adjust it there but I do not know that if there are further parameters to be adjust to call offload. Sometimes without offload it's hard to inference big model in small Gpu And batch inference accelerate the infering process. So I would be appreciate it if someone could help me find an answer. Thank you for your time and happy new year.
Hi @Kevin-shihello-world, the latest MII does not include offloading. We still support this with MII-Legacy. For batching, we do not have a parameter exposed to users that allows defining a batch size. The pipeline will place all provided prompts onto the inference engine at once.
I will talk with @tohtana about how we may expose a batch_size
parameter.
Thanks for your time @mrwyattii however the one in MII-Legacy might be slower, but how slow it could be? Can you show us? thanks forr your time and happy new year