Yiakwy comments

Results 49 comments of


                                            Yiakwy

feat(attention): add Bi-Directional MLM attention model

Hi @TamirFriedman-RecoLabs Are you working on encoder stack ? For example generate model for video, music and so on. Are you still working on this branch ? Happy to hear...

FlashAttention works with single GPU, but crash with accelerate DP on multiple GPU (FlashAttention only support fp16 and bf16 data type)

> > this doesn't work for me again, might be because I have. cc @tridao not sure how relevant this is > > The q, k, v need to be...

impr(cpp/build-compiler-in-docker.sh)

it may relates my local env `docker run --rm -w /app/ -v $(pwd):/app/ $image bash -c "./build-compiler.sh"` . If one want to bind `pwd` to a docker path, this may...

impr(cpp/build-compiler-in-docker.sh)

@zyx-billy Usually, in this workstation, we don't build anything in the physical machine, and just start a docker or using a cmake package manage system to pull dependencies in a...

impr(cpp/build-compiler-in-docker.sh)

@zyx-billy adding `-u $(id -u)` won't work ``` # adding -u $(id -u) won't work. One needs to register the user and annotate the user no passwd required (py39) leiw@sgjur-pod006-3:~/WorkSpace/Github/mlir-playground/cpp$...

PyTorch 2.2.0 NVFuser deprecation is incompatible with TransformerEngine.

Pytorch has decided to drop NVfuser support , see this [PR#105789](https://github.com/pytorch/pytorch/pull/105789) which later reverted by @DanilBaibak . Not sure whether they still have the plan to move forward. But it...

[HIP] hipcc cannot compile .cu files on AMD machines(feature request).

Hi @ppanchad-amd , I am not sure this is right place to disscuss. I have a related question, suppose I have handled a .cu file well with USE_ROCM control to...

Can I use g++ or mpic++ to link the hipcc object?

@anilbommareddy it is generally not recommended to compile hip program with g++ (link with g++), since HIP api changes quickly. In the recent programs, if you want to compile with...

Support sequence parallel on main branch

Lazy computation of partial gradients of weights with an aid of queue is really smart!. @ufotalent However, I don't believe that you need to support **sequence parallel**, a.k.a it does...

Analysis Tool

> > Hi @yxyOo: I have a few questions about total_parameters computing. Since you mentioned your experiments on llama, but I find some inconsistency: > > > > 1. llama...