Pegessi
Pegessi
> It seems like model is loaded to the device during `transformers.AutoModelForCausalLM.from_pretrained` and num_beams error is caused by 'inf' 'nan' or ele
I believe this work is remarkable for the combination of memory and parallelism and is great for bringing higher throughput. However, insufficient part is that experiments about Megatron-Deepspeed as baseline...
I have integrated GMLake into torch2.1.0 manually by myself. These code cannot directly be used to replace the file in pytorch2.1.0 because of some changes about interfaces in cudacachingallocator.h&cpp. Although...
It seems that you have not build the libtaso_runtime.so