Brian Chen

Results 857 comments of Brian Chen

Reading through pages such as https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2023-0/sub-groups-and-simd-vectorization.html and https://intel.github.io/llvm-docs/cuda/opencl-subgroup-vs-cuda-crosslane-op.html, I was under the impression that subgroups do map pretty closely to warps/wavefronts? If that's the case, then having a cross-platform abstraction...

> KernelAbstractions is not One API, so the meaning of subgroup needs to be defined clearly and independently. Yes, which is why I found the second link interesting. Digging around...

> Their API is supposed to be similar to OpenCL for compute, but I cannot find such topics in OpenCL. It's kind of hidden away and took a while for...

I think dedicated structural tangent types are still useful for cases like storing (co)tangents of mutable structs for rules which use `setproperty!`. That said, it would be nice to rely...

Maybe `unbroadcast`'s behaviour should also be part of this `BroadcastThunk`? Then it becomes almost a dual to `Broadcasted`. The trick would be knowing when to materialize vs keep constructing the...

If I'm not mistaken it'd have to be handled in `∇getindex`? That's what I meant by higher level. Otherwise the AD system would have to know a priori that this...

Collapsing arrays of Zeros seems reasonable. Is there a rule for what happens when that array contains both `ZeroTangent` and `NoTangent`? Would `simplify_cotangents(x::Array{

I will try to get back to this and the PR which spawned it this weekend. IIRC doing some types of collapsing made certain Zygote tests very unhappy.

Saw the GSoC idea this proposal is referring to, very interesting stuff. One question from me: would this help with being able to represent dynamically-bounded loops on the tape without...

The ultimate use case I have in mind is a RNN, but here is a simpler dependency-free example: ```julia function f(xs) s = zero(eltype(xs)) for (i, x) in enumerate(xs) s...