Brian Chen
Brian Chen
Give `VChain` from #1126 a try too. What are the times for the resnet model on CPU? I think comparing compilation latency between CPU and GPU forward passes would be...
Yeah, inference generally balks on Zygote and I'm not sure how we can get it inferring/precompiling well (if at all). Also, those forward times are pretty eye-watering! What's the smallest...
Also worth looking into precompiling IRTools. When you look at the SnoopCompile flamegraph, IRTools functions make up a very large percentage of the total area.
It runs at generated function generation time, so I'm not sure what that counts as...certainly worth a try though! (Edit: https://github.com/TuringLang/Turing.jl/issues/1754#issuecomment-1008817663 has some numbers) For reference, this is what I...
I think I missed some of the discussion on this, what do the top and bottom test runs represent? Is the bottom just a second run after warmup or did...
My recollection is that 15 (+-10) seconds is the fixed cost of TTFG (minus import time) regardless of what function is being compiled. Having also poked around the Zygote internals...
How do we feel about this? Would it help to do an `@adjoint -> rrule` conversion first so that `_unprotect` is no longer required?
Sounds good to me. Provenance tracking of possibly shared arrays has proven to be a consistent thorn in our side, so the less that has to be done the better.
We should probably toss an `@ignore` on that line, but to Dhairya's point this isn't a MWE because DiffEq(Flux) and NeuralPDE are doing so much behind the scenes.
Is there any reason this should be an adjoint for just Zygote and not an `rrule` in Chainrules itself?