Roman Ageev
Roman Ageev
Could this be due to slow communication between GPUs? After profiling, it turns out that communication takes up 66% of the time, and **ncclKernel_AllReduce_RING_LL_Sum_half(ncclWorkElem)** is used for this. Is it...
Hi @jimafisk @altacountbabi We have contributing guide for [model addition](https://xturing.stochastic.ai/contributing/model_contributing) We will appreciate any help on this field!
Hi again, @jimafisk @altacountbabi In latest release we added [Generic model](https://github.com/stochasticai/xTuring/tree/main/examples/generic), you can use it for `Open Assistant` models in our library! Also maybe a good option for you will...
Hi @vahuja4 We have contributing guide for [model addition](https://xturing.stochastic.ai/contributing/model_contributing) We will appreciate any help on this field!
Hi @fpaupier, We also will take this up, but right now you can use `GenericModel` for them.
Hi @cnbeining @shreyansh26, We have contributing guide for [model addition](https://xturing.stochastic.ai/contributing/model_contributing) We will appreciate any help on this field!
Hi again, @cnbeining @shreyansh26 In latest release we added [Generic model](https://github.com/stochasticai/xTuring/tree/main/examples/generic), you can use it for `llama-13b` models in our library! Also maybe a good option for you will be...
We are also working on addition kbit quantisation to generic model, so it should be released soon. Then you will be able to use `llama-13b` or any model with 4bit...