Roman Ageev comments

Results 8 comments of


                                            Roman Ageev

Slower inference results for BLOOM fp16 on identical hardware

Could this be due to slow communication between GPUs? After profiling, it turns out that communication takes up 66% of the time, and **ncclKernel_AllReduce_RING_LL_Sum_half(ncclWorkElem)** is used for this. Is it...

Add Open Assistant support

Hi @jimafisk @altacountbabi We have contributing guide for [model addition](https://xturing.stochastic.ai/contributing/model_contributing) We will appreciate any help on this field!

Add Open Assistant support

Hi again, @jimafisk @altacountbabi In latest release we added [Generic model](https://github.com/stochasticai/xTuring/tree/main/examples/generic), you can use it for `Open Assistant` models in our library! Also maybe a good option for you will...

are flan-t5 models supported?

Hi @vahuja4 We have contributing guide for [model addition](https://xturing.stochastic.ai/contributing/model_contributing) We will appreciate any help on this field!

Add `mistral-7B` as a base model of xTuring

Hi @fpaupier, We also will take this up, but right now you can use `GenericModel` for them.

Add support for more sizes of LLaMA

Hi @cnbeining @shreyansh26, We have contributing guide for [model addition](https://xturing.stochastic.ai/contributing/model_contributing) We will appreciate any help on this field!

Add support for more sizes of LLaMA

Hi again, @cnbeining @shreyansh26 In latest release we added [Generic model](https://github.com/stochasticai/xTuring/tree/main/examples/generic), you can use it for `llama-13b` models in our library! Also maybe a good option for you will be...

Add support for more sizes of LLaMA

We are also working on addition kbit quantisation to generic model, so it should be released soon. Then you will be able to use `llama-13b` or any model with 4bit...