Kevin Ko issues

Results 22 issues of


Kevin Ko

Add some improvements for pipeline module, engine and assertion into ds engine

There are some improvements. 1. Improve error message of pipeline engine. see https://github.com/microsoft/DeepSpeed/pull/1438 2. Move `has_bool_tensors` to `kwargs` to make usable with `deepspeed.initialize` and rename `has` to `send`. see https://github.com/microsoft/DeepSpeed/pull/1399...

kiwi backend 도입

enhancement

Add GPTNeoX and XGLM

support model parallelism using `PipelineHelper` in parlai

Stop text feature for inference module

Are there some features like `stop_texts` or `stop_ids` of faster transformer in Lightseq? for example, if I want to finish generation when `hello` token is generated, Can I stop the...

longer sequence length

Thanks for great open source. I wonder that you have any plans to support longer sequence lengths? Currently, the maximum length of the Softmax kernel seems 512.

lightseq cross entropy kernel is slower than original torch kernel.

``` CrossEntropyLoss: 0.0004372596740722656 LSCrossEntropyLayer: 0.0010995864868164062 ``` Are there any benchmark results for this kernel? As a result of my experiments, it seems to be slower than the original torch kernel.

Introduce improvements from OSLO

1. [AOTAutograd](https://github.com/pytorch/functorch) is a novel engine provided by functorch that can fuse all parts of a neural network. I added it to [OSLO](https://github.com/tunib-ai/oslo/tree/master/oslo/pytorch/kernel_fusion/mem_efficient) recently, and this makes training very faster....

feature request

How PP and ZeRO stage 2+ work together?

At least as far as I know, ZeRO2 splits gradients and PP accumulates gradients, so there's no real performance boost for these two mechanisms working together. ### related issues -...

documentation

Question about generative model notebook

https://github.com/ELS-RD/transformer-deploy/blob/main/demo/generative-model/gpt2.ipynb In this notebook, when you tested cache feature, I think you should use `generate` function rather than `forward` function because the advantages of cache can be obtained with saving...