Taehyun Kim
Taehyun Kim
Can the lib_path be set as a relative path, or an absolute path? I've tried both, but couldn't make it to work. The file libth_transformer.so is there alright, but can't...
There was an issue while remaking the library... I've seen CUDA compatibility issues with FT, but I can't seem to find the post that I've seen. I was using CUDA11.8,...
Does the kernels even work? I set random weights to set up an MoE T5, but I continuously get errors regarding internal errors in CUTLASS MoE GEMM kernel. Any thoughts?...
If you look at the generate function, the input to the model is updated by concatenating a generated token to the input sequence. If I am correct, nanoGPT does not...
Have you found a fix?
Yeah, I found the same cause... the multi-GPU processes are not using the same sequence lengths. You'll need to modify the _All2All function and manually set the input/output split sizes...
I have a painstaking but correct fix for that. - Initialize the deepspeed MoE layers - read the state dicts from your checkpoints - check the local rank of your...
Hello to both of you! I am trying to run model-parallel (2 GPUs) inference with Mixtral, but it seems the current implementation is not automatically moving the activation from GPU0...
I am also a beginner in AutoSA, but here is what I understand: Set the --hbm --hbm-port-num to generate the code. The code should create as many HBM ports as...
@veronia-iskandar I know it has been over a year after your question has been posted, but did you have any luck afterwards?