Scott Wolchok

Results 66 comments of Scott Wolchok

This instruction does seem to be generated in `xnn_f32_vrcopysignc_ukernel__sse2_u16` in my local build as well, I just have a machine handy that can execute it. I guess something in the...

> in xnn_f32_vrcopysignc_ukernel__sse2_u16 the symbol name here may or may not be accurate; I just found that XNNPACK has an avx512vnnigfni configuration

we are in fact building at least some avx512vnnigfni files per https://github.com/pytorch/executorch/actions/runs/15171458271/job/42808833402?pr=10362 however, this is not new; https://github.com/pytorch/executorch/actions/runs/15218616519/job/42809734929 (a recent trunk run from HUD) also builds them. either 1) we...

failures seem to pattern-match to known issues on main. merging.

this looks like it's a feature request for clang/LLVM, not torchchat

hm. portable sub should be working fine: https://github.com/pytorch/executorch/blame/main/kernels/portable/cpu/op_sub.cpp#L47 (@larryliu0820 are you sure you're running portable?) optimized just plain doesn't seem to have a resize except in one specific case, which...

> Are you sure? I do see this OK, two specific cases. that resize is also gated. there are clearly uncovered cases

findings: - the specific regression test I attempted to add for op_sub passes for both portable and optimized. - I was wrong when I said "there are clearly uncovered cases"...