Brian Chen comments

Results 857 comments of


                                            Brian Chen

exposing warp-level semantics

Reading through pages such as https://www.intel.com/content/www/us/en/docs/oneapi/optimization-guide-gpu/2023-0/sub-groups-and-simd-vectorization.html and https://intel.github.io/llvm-docs/cuda/opencl-subgroup-vs-cuda-crosslane-op.html, I was under the impression that subgroups do map pretty closely to warps/wavefronts? If that's the case, then having a cross-platform abstraction...

exposing warp-level semantics

> KernelAbstractions is not One API, so the meaning of subgroup needs to be defined clearly and independently. Yes, which is why I found the second link interesting. Digging around...

exposing warp-level semantics

> Their API is supposed to be similar to OpenCL for compute, but I cannot find such topics in OpenCL. It's kind of hidden away and took a while for...

Returning `Broadcasted` cotangents for `Broadcasted` arguments?

I think dedicated structural tangent types are still useful for cases like storing (co)tangents of mutable structs for rules which use `setproperty!`. That said, it would be nice to rely...

Returning `Broadcasted` cotangents for `Broadcasted` arguments?

Maybe `unbroadcast`'s behaviour should also be part of this `BroadcastThunk`? Then it becomes almost a dual to `Broadcasted`. The trick would be knowing when to materialize vs keep constructing the...

Array `getindex` rule unable to handle Zero types and `NotImplemented`

If I'm not mistaken it'd have to be handled in `∇getindex`? That's what I meant by higher level. Otherwise the AD system would have to know a priori that this...

Array `getindex` rule unable to handle Zero types and `NotImplemented`

Collapsing arrays of Zeros seems reasonable. Is there a rule for what happens when that array contains both `ZeroTangent` and `NoTangent`? Would `simplify_cotangents(x::Array{

Array `getindex` rule unable to handle Zero types and `NotImplemented`

I will try to get back to this and the PR which spawned it this weekend. IIRC doing some types of collapsing made certain Zygote tests very unhappy.

Enhancement proposal: Modular tape caching

Saw the GSoC idea this proposal is referring to, very interesting stuff. One question from me: would this help with being able to represent dynamically-bounded loops on the tape without...

Enhancement proposal: Modular tape caching

The ultimate use case I have in mind is a RNN, but here is a simpler dependency-free example: ```julia function f(xs) s = zero(eltype(xs)) for (i, x) in enumerate(xs) s...