Xiaodong Wang
Xiaodong Wang
It seems hc-kernel-assemble is using the system ld, instead of $BINDIR/lld: https://github.com/RadeonOpenCompute/hcc/blob/44f47d857e1df2b1042a1d8f3a2e3225eee3d908/lib/hc-kernel-assemble.in#L185. This will make it platform dependent (even if it works for the current ubuntu/centos, it doesn't work for...
Summary: As a follow up in https://github.com/pytorch/pytorch/pull/92664 (D42619405 (https://github.com/pytorch/pytorch/commit/e6a8267cf54af30e33de1ef22625e972afbf03ff)), clean up the TRITON_CACHE_DIR settings. There are a few places touching TRITON_CACHE_DIR: 1. triton/fb/triton_util.py: when import triton 2. caffe2/torch/_inductor/codecache.py 3. caffe2/torch/_inductor/triton_ops/autotune.py...
### ❓ The question Hi, i'm from the PyTorch team and I'm recently aware that we need some customization in layer norm, because it'll seg fault without bias: https://github.com/allenai/OLMo/blob/cf121084409d844e4f540b7d08b8f37bbe1eec98/olmo/model.py#L203. I...
Summary: hipblas is pretty bad for MI300. So turn on hipblaslt by default.  Test Plan: CI signal. Differential Revision: D54105053
Summary: We have consolidated the hip and cuda dependency. Missed this place where we still have separate imports. Fixing. Differential Revision: D56543680
We can sometimes pass in np.bool_ by accident. e.g. we have such code ``` multiple_q = attn_bias.q_seqinfo.max_seqlen > 1 IS_CAUSAL = multiple_q and _is_supported_causal_bias(attn_bias) ``` If the max_seqlen is numpy.int,...
I ran a test that exercised this code in dev mode and ASAN found a memory access issue due to the iterator returned by lower_bound being dereferenced unconditionally. I believe...
We have seen in a few cases, we're failing to load the shared object from the cache, because: 1. the cached object is built in aarch64 and we're trying to...
python3 fbgemm_gpu/experimental/gen_ai/bench/comm_bench.py --num_iters=20 --export_csv Running benchmark with 8 ranks [{'N': 1024, 'fbgemm_1shot_bwidth': 0.21609986687366958, 'fbgemm_1shot_time': 0.009477099776268006, 'fbgemm_2shot_bwidth': 0.16966774973717594, 'fbgemm_2shot_time': 0.012070649862289428, 'nccl_bwidth': 0.07660384301398733, 'nccl_time': 0.026734951138496398, 'symm_1shot_bwidth': 0.170175283451603, 'symm_1shot_time': 0.012034650146961211, 'symm_2shot_bwidth': 0.08925419349643432, 'symm_2shot_time':...