fix: regression in non-fast scalar indexing support
fixes #759
- [x] Jacobian support for GPU Arrays has been restored
- [x]
ForwardDiff.gradientnow supports GPU Arrays
cc @ChrisRackauckas @devmotion
Codecov Report
:white_check_mark: All modified and coverable lines are covered by tests.
:white_check_mark: Project coverage is 90.05%. Comparing base (463e830) to head (5b8ffab).
Additional details and impacted files
@@ Coverage Diff @@
## master #760 +/- ##
==========================================
+ Coverage 89.79% 90.05% +0.25%
==========================================
Files 11 12 +1
Lines 1039 1066 +27
==========================================
+ Hits 933 960 +27
Misses 106 106
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
In https://github.com/JuliaDiff/ForwardDiff.jl/pull/472, the seed! (etc) functions were written in a generic (non-typespecific) way that should have supported GPU arrays. This PR adds further specializations to seed! for some specific types by using extensions. But why does the previous approach not work anymore? That one seemed better since it supports non-fast scalar indexing for arrays that are not GPU arrays.
Has it been properly explored if the existing functions can be written in an alternative way that would support both fast and non-fast scalar arrays with the same generic code (which would avoid any new extensions)?
Edit: I had missed that https://github.com/JuliaDiff/ForwardDiff.jl/pull/739 reverted some of #472.
Yes, on the master branch seeding is (again) performed without broadcasting. Depending on the structural array type the set of indices are not readily available in an allocation-free broadcastable form (e.g. set of uppertriangular indices for UpperTriangular) and hence I reverted the code back to iterating over indices - without considering that it would break GPU compatibility.
If we want to avoid these allocations (and the broadcasting overhead) for non-GPU arrays, I don't immediately see how this issue could be solved by a generic implementation. Possibly the amount of code duplication could be reduced by introducing a helper function or branch that based on the type of the input array switches between broadcasting and iterating (presumably defaulting to iteration?), but even in this case it would be necessary to add an extension that ensures that GPU arrays use broadcasting.
Alternatively, we could default to using broadcasting (with the additional overhead of collecting the indices), and - as an additional optimization - only use iteration for a handful of selected base array types such as Array, UpperTriangular{_,<:Matrix}, etc. This wouldn't require an extension, but if we add such optimizations we would still require both an implementation with and one without broadcasting.
What are your thoughts @KristofferC?
bump on this
Testing this patch out with Lux.jl, it will still cause regressions in the cases where we have a wrapper over CuArray.
only use iteration for a handful of selected base array types such as
Array
This seems like a good solution without causing regression on use-cases that was supported prior to #739. There's also ArrayInterface.fast_scalar_indexing but not sure how the maintainers feel about taking a dep on ArrayInterface.
bump on this
Can you also add tests of https://github.com/JuliaDiff/ForwardDiff.jl/pull/739 and https://github.com/JuliaDiff/ForwardDiff.jl/pull/743 for these new branches?
The GPUArray backends don't support broadcasting with wrapped arrays nicely, so those tests will mostly fail.
julia> using CUDA, LinearAlgebra
julia> x = LowerTriangular(cu(rand(Float32, 4, 4)))
4×4 LowerTriangular{Float32, CuArray{Float32, 2, CUDA.DeviceMemory}}:
0.960887 ⋅ ⋅ ⋅
0.316333 0.612238 ⋅ ⋅
0.236091 0.209854 0.0883058 ⋅
0.370694 0.732681 0.0111619 0.270063
julia> x[diagind(x)] .= 10
And they won't have un-assigned elements.
julia> CuArray{Float32}(undef, 2, 3)
2×3 CuArray{Float32, 2, CUDA.DeviceMemory}:
-5.81535f-36 1.25147f7 -2.50125f-11
-1.98624 1.84662 1.95155
julia> JLArray{Float32}(undef, 3, 4)
3×4 JLArray{Float32, 2}:
6.771f-42 6.771f-42 9.42f-43 0.0
6.771f-42 6.771f-42 0.0 0.0
6.771f-42 6.771f-42 0.0 4.5657f-41
@ChrisRackauckas and @devmotion we're currently a bit in version hell between Lux and Bijectors in regard to ForwardDiff (https://github.com/LuxDL/Lux.jl/issues/1530). Is the any way this can be merged?
Not in its current form - it seems the PR also has merge conflicts.