Anton Smirnov
Anton Smirnov
I guess it fails, because `Test.@inferred` errors on the first non-inferrable function and `ChainRulesTestUtils` [doesn't do](https://github.com/JuliaDiff/ChainRulesTestUtils.jl/blob/dd0e246d1516451ec963db55a7a51910819082b3/src/testers.jl#L255) any special handling in this case. `test_rrule()` probably needs to wrap `@inferred` in a...
Inputs are already quite small 128x128, batch size is set to 1 (for these timings) as well. Timings for (in the same session in that order): ```julia @time train_loss(model, x,...
Tried running on 0.6.29, but hitting this error: ``` ERROR: LoadError: Need an adjoint for constructor Base.Iterators.Enumerate{Vector{Int64}}. ``` I guess code relies on #785 as I had to use `map`,...
> Worth trying with ideas from #1126, the simplest of which is to run this before your model: > > ``` > @eval Flux (c::Chain)(x) = foldl((y,f) -> f(y), (x,...
> What are the times for the resnet model on CPU? I think comparing compilation latency between CPU and GPU forward > passes would be the easiest way to start...
Here's also `ProfileView` for the whole ResNet `tinf` flamegraph (just in case). Everything in red is mostly Zygote related. 
> Also, those forward times are pretty eye-watering! These are the timings for the very first run on a cold start. Subsequent ones are fast: Here's forward for the GPU:...
@torfjelde for my case (NN using Flux.jl), compile times can be significantly improved by fixing type-stability issues with layers. See https://github.com/FluxML/NNlib.jl/pull/370 for timings, mainly first forward pass. But it doesn't...
https://github.com/JuliaGPU/KernelAbstractions.jl/issues/298 seems to be related issue.
@vchuravy here's the output: [error.txt](https://github.com/EnzymeAD/Enzyme.jl/files/8928691/error.txt) It is for when `device = CUDADevice()`