Daniel Stokes

Results 15 comments of Daniel Stokes

Hi @akhoroshev, first off thanks for the contribution. I agree with @nv-guomingz about having this be a separate model, but also that this is something we could handle separately after...

> I agree that the is_moe_layer function is better. But what about dense_intermidiate_size param? It's ok or we need more general solution? This is a good question, perhaps a list...

Thanks @akhoroshev that makes perfect sense to me. Feel free to make that change to this PR if you would like I discussed re shared experts, and the verdict was...

Hi @Ahmad-Magdy-Osman, currently these changes are being tested on our internal branch. Once they are accepted internally they will be released in one of our upcoming weekly releases. We will...

This LGTM, thanks @jinyangyuan-nvidia. One bigger change would be, with this sort of approach, I think it would be good to consider if we could couple this with the DP...