Kevin Ko

Results 22 issues of Kevin Ko

There are some improvements. 1. Improve error message of pipeline engine. see https://github.com/microsoft/DeepSpeed/pull/1438 2. Move `has_bool_tensors` to `kwargs` to make usable with `deepspeed.initialize` and rename `has` to `send`. see https://github.com/microsoft/DeepSpeed/pull/1399...

Are there some features like `stop_texts` or `stop_ids` of faster transformer in Lightseq? for example, if I want to finish generation when `hello` token is generated, Can I stop the...

Thanks for great open source. I wonder that you have any plans to support longer sequence lengths? Currently, the maximum length of the Softmax kernel seems 512.

``` CrossEntropyLoss: 0.0004372596740722656 LSCrossEntropyLayer: 0.0010995864868164062 ``` Are there any benchmark results for this kernel? As a result of my experiments, it seems to be slower than the original torch kernel.

1. [AOTAutograd](https://github.com/pytorch/functorch) is a novel engine provided by functorch that can fuse all parts of a neural network. I added it to [OSLO](https://github.com/tunib-ai/oslo/tree/master/oslo/pytorch/kernel_fusion/mem_efficient) recently, and this makes training very faster....

feature request

At least as far as I know, ZeRO2 splits gradients and PP accumulates gradients, so there's no real performance boost for these two mechanisms working together. ### related issues -...

documentation

https://github.com/ELS-RD/transformer-deploy/blob/main/demo/generative-model/gpt2.ipynb In this notebook, when you tested cache feature, I think you should use `generate` function rather than `forward` function because the advantages of cache can be obtained with saving...