侯奇
侯奇
> > [@qinghon](https://github.com/qinghon) Flux does support SM89, but not in current released version yet. > > hi [@wenlei-bao](https://github.com/wenlei-bao), could you please provide an estimated timeline for PCIe support in [#32](https://github.com/bytedance/flux/issues/32#issuecomment-2300527153)...
flux #0: total 250.066 us, gemm 384.129 us, comm -134.063 us, gemm_only 191.983 us * total is measured with AG+GEMM * gemm is measured with a separated GEMM only implementation...
1. for PCI-e machines, better use ring mode. all-to-all is for NVLink 2. nop 3. local_copy is not disabled? 4. there should not be any difference. if you find a...
please make sure you get the latest. follow the doc and start from scratch. if there is still a problem, please provide with more info like you nvcc version and...
is this fixed? close for too long no activation. feel free to re-open it
> Does nvshmem support multi-machine p2p? Thanks! [@wenlei-bao](https://github.com/wenlei-bao) it does support multi machine. but here it seems to be a BUG. please provide your test command.
for dense, with sequence parallel, AR +LN is converted into RS + LN + AG. in which AG for AllGather, LN for LayerNorm, RS for reduce_scatter. for the FFN part,...
> In the end-to-end (E2E) implementation, you have used Tensor Parallelism, correct? yes > How are you handling Reduce Scatter (RS) after post-projection? post-projection is a GEMM too. usually post-projection...
> I have read articles about Flux and noticed that the paper mentions a `TP+SP` approach in Transformer, not pure `TP`. To confirm: During the decoding phase of the inference...
can you provide more information, the compile enviroment such as CUDA version and hardware info? we support FP8. Don't know why it fails.