Mayank Mishra
Mayank Mishra
Hi @philippmtk unfortunately they don't lead to indentical outputs. Unfortunately, resharding checkpoints changes the order of operations and this is a problem with floating point arithmetic. fp operations are not...
Seems like there is a check in place which is not letting the new weights work with MII
Any updates on this? @jeffra @RezaYazdaniAminabadi
https://github.com/huggingface/transformers-bloom-inference/blob/abe365066fec6e03ce0ea2cc8136f2da1254e2ea/bloom-inference-server/ds_inference/grpc_server.py#L33 @cderinbogaz I hacked my way around it for now I pass the downloaded model path and checkpoint dict for the model I need to use and the model="bigscience/bloom" I...
@mrwyattii I believe your commit yesterday has fixed this? Let me know. I am closely watching this repo :)
Hi @TahaBinhuraib I think MII doesn't support int8 models. Can you try vanilla DS-inference? https://github.com/huggingface/transformers-bloom-inference/tree/main/bloom-inference-server you can try running via a CLI/ deploy a generation server as given in the...
Thanks @mrwyattii
https://github.com/microsoft/DeepSpeed/issues/2382 is related to this
Running the new PR queries = ["cat " * 2000]*4 for max_new_tokens = 10, generated_tokens = [10,10,10,10] for max_new_tokens = 100, generated_tokens = [100,100,100,99] for 300 -> [299, 300, 299,...
Hi, any update on the above @RezaYazdaniAminabadi ^^? Were you able to find the error?