Halide icon indicating copy to clipboard operation
Halide copied to clipboard

WebGPU correctness tests are failing on new buildbot

Open abadams opened this issue 2 years ago • 5 comments

Perhaps dawn has changed and we need to account for it:

https://buildbot.halide-lang.org/master/#/builders/26/builds/32

HL_JIT_TARGET=host-webgpu
HL_TARGET=host-webgpu
#CTEST_RESOURCE_GROUP_COUNT=
/Users/halidenightly/build_bot/worker/halide-nightly-main-llvm16-x86-64-osx-cmake/halide-build/test/correctness/correctness_argmax
Error: Requested timedWaitAnyMaxCount is not supported
    at Initialize (/Users/halidenightly/dawn/src/dawn/native/EventManager.cpp:98)
    at Initialize (/Users/halidenightly/dawn/src/dawn/native/Instance.cpp:189)
Required regular expression not found. Regex=[Success!]

abadams avatar Oct 09 '23 18:10 abadams

This looks like it is being caused by a mismatch between Halide's WebGPU headers and Dawn's. I can't seem to find any documentation as to why Dawn has changed their headers, and Dawn's seem to differ from the other major implementations. @jrprice may know?

shoaibkamil avatar Oct 31 '23 19:10 shoaibkamil

I can't seem to find any documentation as to why Dawn has changed their headers, and Dawn's seem to differ from the other major implementations.

The WebGPU native headers are not yet stable so you are still likely to hit incompatibilities if you're using a version of Dawn that doesn't match the ABI of the headers you're using. If you've just built the latest version of Dawn on the new buildbot then this would explain it.

I'd recommend either:

  1. Downgrading the version of Dawn on the new buildbot to match whichever Dawn commit is being used on the other buildbots.
  2. Upgrading all Dawn versions and mini_webgpu.h to the latest versions.

If you want to do option 2 then I can help find the most recent compatible versions of Dawn and the WebGPU headers.

jrprice avatar Oct 31 '23 20:10 jrprice

(briefly emerges from the depths...)

IMHO option 2 is the better answer, upgrading Dawn isn't hard.

(submerges once again, bloop)

steven-johnson avatar Oct 31 '23 21:10 steven-johnson

Agree-- option 2 is the best option.

@jrprice If you point me to the most recent compatible version of Dawn, I can create a PR. I have something working with tip-of-tree Dawn (which now seems to support wgpuInstanceProcessEvents() so possibly we can eliminate one set of hacks). I also tested with the wgpu implementation, but that does not support overrides, so I'll add a note in README_webgpu.md saying we don't support it.

shoaibkamil avatar Oct 31 '23 21:10 shoaibkamil

If you point me to the most recent compatible version of Dawn, I can create a PR.

I went ahead and made the PR since I had to fix up Halide in order to test ToT Dawn anyway.

I also documented the process of updating mini_webgpu.h, which I promised to do many months ago.

tip-of-tree Dawn (which now seems to support wgpuInstanceProcessEvents() so possibly we can eliminate one set of hacks).

This might work on the Dawn side, but unfortunately Emscripten still does not support wgpuInstanceProcessEvents so we'll still need to native vs Emscripten split.

jrprice avatar Nov 03 '23 21:11 jrprice