Bolt.jl
Bolt.jl copied to clipboard
Ie
Toward a hierarchy-less version - eventually want to compare speed against hierarchy.
Codecov Report
Merging #57 (edbaa86) into main (de08e4d) will decrease coverage by
21.71%
. The diff coverage is0.28%
.
:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more
@@ Coverage Diff @@
## main #57 +/- ##
===========================================
- Coverage 46.09% 24.39% -21.71%
===========================================
Files 7 8 +1
Lines 692 1279 +587
===========================================
- Hits 319 312 -7
- Misses 373 967 +594
Impacted Files | Coverage Δ | |
---|---|---|
src/Bolt.jl | 100.00% <ø> (ø) |
|
src/ie.jl | 0.00% <0.00%> (ø) |
|
src/ionization/ionization.jl | 22.03% <ø> (-2.56%) |
:arrow_down: |
src/ionization/recfast.jl | 91.78% <ø> (-0.08%) |
:arrow_down: |
src/perturbations.jl | 0.00% <0.00%> (ø) |
|
src/spectra.jl | 0.00% <0.00%> (ø) |
|
src/background.jl | 80.00% <28.57%> (-6.37%) |
:arrow_down: |
... and 1 file with indirect coverage changes
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.
About to add a cleaned up first scipt version (ie_single_k.jl) that (finally!) works. It is slow and is only tested for a single relatively large k (~0.03 h/Mpc), but it works!!
In action with a starting guess of zero (for quadrupole parts) on the photon monopole:
Upcoming improvements:
- [ ] clean up the iteration code
- [x] try different (namely, higher) k
- [ ] set some scheme for the time integration points that is more efficient than equispaced in x (to go faster)
- [ ] speed up the iteration (+ better initial guess for Pi)
The code I pushed has the file I described above, which uses the equations here (but in conformal newton gauge as I wrote out in the notes).
I also turned the RSA off in ie.jl (which is a file that only evolves the low-order moments of the photons and uses the splines for quadrupoles).
Some notes about this last point:
RSA serves two purposes:
- Give the solver less computational work to do by reducing the number of ODEs that need to be solved at late times.
- Prevent boundary condition reflection blow up of photon perturbations.
The latter is entirely a numerical artefact of the hierarchy - with the ie solution we can properly resolve photon oscillations without the boundary issue. While aesthetically pleasing to say we don't need to make the RSA assumption (for a finite number of multipoles), it is probably desirable from a computational standpoint to continue using RSA. But we no longer absolutely have to to get a correct answer (and I wonder if there are any science cases for late-time radiation-like perturbations...).
This PR is at a very exciting step! I'm finding your code to be very clear, especially for a first draft.
For my own understanding, I still need to do a bit of derivation to go from the hierarchy to the IE, as you've done in your notes!
Going to start posting some numerical experiment results (plots/timings/norms) here for the massless neutrino FFT dependence on numerical parameters. (Everything here is in conformal time integration, though I don't find much of a difference between stepping in $\eta$ and $x$ in overall accuracy).
Varying:
- Full-to-truncated switch $\eta_{\mathrm{switch}}$
- Number of FFT points $M$
- Number of iterations $N_{I}$
Also will look at truncating the number of multipoles in the initial solve and using a better initial ansatz, as by default here I will use $\ell_{\nu} = 50$ in the initial full hierarchy and the dumbest ansatz, which is an array of all zeros.
Timings should be taken with a grain of salt because the FFT code is 1. Unoptimized and 2. Still carrying the weight of the full hierarchy for the photons and massive neutrinos, but I think are still useful for a rough indication at least in the tradeoff of $M$ vs $N_{I}$.
Fiducial values of the main parameters (fixed to this unless otherwise stated): $\eta_{\mathrm{switch}} = 1 \mathrm{Mpc}$, $M=8192$, $N_{I}=5$. And right now this is all for $k=0.03 h/\mathrm{Mpc}$.
Vary $M$ and $N_{I}$ at fixed switch
Black is the hierarchy. Clearly more M helps. Here is some summary output:
3.914403 seconds (621.16 k allocations: 219.230 MiB, 15.97% gc time) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.024327736624258293, ℓ=2: 0.004968625571585811
3.337983 seconds (975.75 k allocations: 391.330 MiB, 1.96% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 5.628201856013467e-5, ℓ=2: 9.69840450718242e-6
3.989557 seconds (1.71 M allocations: 736.494 MiB, 5.39% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 9.016324731957222e-6, ℓ=2: 1.6088676485945645e-6
Without any look at the flamegraph it seems like most of the time is not going into the FFT, since increasing M doesn't affect the walltime (which, note, is @time and not @btime), even though it affects the accuracy.
Note here that the error is sum( (iter-hierarchy)^2 /M )
Here is the same test at higher $k = 0.3 ~h/\mathrm{Mpc}$:
14.687309 seconds (3.66 M allocations: 420.126 MiB, 0.61% gc time, 14.60% compilation time) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.0027027526587004207, ℓ=2: 0.0005436380175114952
12.744275 seconds (1.48 M allocations: 456.152 MiB, 0.79% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 1.7713887051272318e-5, ℓ=2: 2.6232322357963993e-6
12.747244 seconds (2.20 M allocations: 799.949 MiB, 1.47% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 6.691494413715124e-6, ℓ=2: 9.531342334122629e-7
Obviously this takes much longer (but we might explore relaxing the tolerance?). Either way convergence results are qualitatively similar (error is actually lower than for lower k).
Again most of the time is probably spent in the truncated ODE solve, and I would imagine dropping the photon and massive neutrino hierarchies will help down the line.
To follow up on this, the time for the solutions of the "late" full hierarchy is
3.097427 seconds (91.47 k allocations: 16.091 MiB)
while the "early" time is less than half a second.
Each truncated solve costs around 2.4 seconds
2.357683 seconds (115.19 k allocations: 17.830 MiB)
- which checks out with the 5 iter time of 12.7ish above.
So we are saving only around 30% of the runtime by truncating the neutrinos. Will revisit these timings again later, but this helps (at least me) get oriented.
Vary the switch and $M$ at fixed $N_{I}$
Here $N_{I}$ is fixed to 5 and I vary the switch from $1 \mathrm{Mpc}$ to [0.5,10.,100.] $\mathrm{Mpc}$. $M$ is varied as above.
For reference, for this k-mode, the "horizon scale" (for which there is no obvious definition) is where $\frac{k}{\mathcal{H}}=1$ is $\eta=37 \mathrm{Mpc}$. Note that the place where the variation in the monopole transitions from exponential to linear happens after this...
Output:
2.477748 seconds (640.01 k allocations: 220.874 MiB, 1.51% gc time) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 44.97438381582731, ℓ=2: 9.615684393834469
2.272631 seconds (996.48 k allocations: 391.946 MiB, 3.80% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.05185301616318226, ℓ=2: 0.010879956079524676
2.579722 seconds (1.72 M allocations: 737.093 MiB, 6.44% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 9.873425689145687e-5, ℓ=2: 1.799027061434055e-5
Output:
2.211860 seconds (618.90 k allocations: 218.756 MiB, 2.60% gc time) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.05024428535457524, ℓ=2: 0.010362831120741907
2.343475 seconds (979.42 k allocations: 391.376 MiB, 5.47% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 9.89168854707918e-5, ℓ=2: 1.764483646486915e-5
2.749802 seconds (1.72 M allocations: 736.774 MiB, 5.79% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 9.303373602457604e-6, ℓ=2: 1.617072960746139e-6
Output:
2.489212 seconds (591.94 k allocations: 217.901 MiB, 1.63% gc time) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 1.0948808134474916e-5, ℓ=2: 2.0282281959209817e-6
3.096040 seconds (963.15 k allocations: 390.454 MiB, 3.00% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 9.158093237790724e-6, ℓ=2: 1.7164577184716518e-6
2.870788 seconds (1.70 M allocations: 735.816 MiB, 6.01% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 8.752670467463331e-6, ℓ=2: 1.6362669057618164e-6
Output:
1.982466 seconds (567.07 k allocations: 216.046 MiB, 2.08% gc time) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 6.783373104924727e-5, ℓ=2: 3.2583090867381304e-5
2.086790 seconds (931.32 k allocations: 388.667 MiB, 3.72% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 6.71768974129022e-5, ℓ=2: 3.0038297397818197e-5
2.421192 seconds (1.67 M allocations: 733.976 MiB, 6.74% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 6.763186632413128e-5, ℓ=2: 2.8919112145994086e-5
#--------
Again, here is the same thing for the higher-$k$ mode ($k=0.03~h/\mathrm{Mpc}, "horizon scale" 3.4 Mpc):
Output:
9.759966 seconds (1.10 M allocations: 281.695 MiB) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.0021914827906542155, ℓ=2: 0.0004347068032540287
8.746645 seconds (1.49 M allocations: 455.735 MiB, 5.16% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 2.4660373189609712e-5, ℓ=2: 2.686193755159691e-6
7.930377 seconds (2.20 M allocations: 799.368 MiB, 1.31% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 8.324277287684964e-6, ℓ=2: 1.0597585399984886e-6
Output:
8.164170 seconds (1.11 M allocations: 282.128 MiB, 0.80% gc time) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 5.9543352472409525e-5, ℓ=2: 7.192616175038779e-6
8.267998 seconds (1.45 M allocations: 453.342 MiB, 1.47% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 1.5593479481400864e-5, ℓ=2: 1.9421464632261884e-6
8.791901 seconds (2.20 M allocations: 798.750 MiB, 1.27% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 6.781303864436087e-6, ℓ=2: 9.661082440228582e-7
Output:
8.845261 seconds (1.06 M allocations: 279.297 MiB, 0.77% gc time) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 3.380758869776408e-5, ℓ=2: 9.897320474102225e-6
8.691568 seconds (1.44 M allocations: 452.829 MiB, 0.70% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 1.416734005629872e-5, ℓ=2: 5.76125185991097e-6
9.181634 seconds (2.18 M allocations: 798.308 MiB, 1.96% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 9.540303946926921e-6, ℓ=2: 4.245814075267681e-6
Output:
8.330425 seconds (1.05 M allocations: 278.212 MiB, 0.73% gc time) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 6.854331049445173e-6, ℓ=2: 2.56038594633846e-6
8.464808 seconds (1.42 M allocations: 451.609 MiB, 0.65% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 6.4147161848718985e-6, ℓ=2: 2.4778285049901822e-6
8.088516 seconds (2.16 M allocations: 796.813 MiB, 2.21% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 6.309153736510417e-6, ℓ=2: 2.4354245288514206e-6
Walltime decreases as we start later (makes sense, less ODE to solve). The error metric also decreases as we start later. At face value this makes no sense, but looking at the plots for later switches (if you squint) you can see that this is because the later starts only have to do the FFT in the easy-to-FFT nearly flat wiggle zone, and I am only measuring the error where we actually do the FFT.
Hmm I may need to fix that first plot for $k=0.03$ but it should agree quite well for higher $M$
Here is that first $k=0.03$ plot at higher $M$ where the range of the plot is smaller (though only fair comparison with others is the lowest $M$ here vs highest $M$ in other plots)
Remainder of the tests outlined above:
Drop the number of multipoles in the initial hierarchy
Here I show $\ell_{\nu} = 3, 10, 50$, all other plots use 50. Main takeaway is that at fixed $M$ there is basically no difference in using fewer multipoles (which physically makes sense for this low a switch time).
Annoyingly this first plot got a little cut off...
Outputs:
$\ell_{\rm{max}}=50$:
4.783524 seconds (1.23 M allocations: 415.648 MiB, 2.18% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 9.840033848574503e-5, ℓ=2: 1.828074841420805e-5
4.868371 seconds (1.94 M allocations: 759.623 MiB, 2.08% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 8.84161724083902e-6, ℓ=2: 1.632847426010087e-6
$\ell_{\rm{max}}=10$:
4.877348 seconds (1.23 M allocations: 415.077 MiB, 1.63% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.0004217199376935739, ℓ=2: 6.547150731140973e-5
5.011039 seconds (1.96 M allocations: 760.168 MiB, 1.97% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.00024554041721549364, ℓ=2: 4.880646789688861e-5
$\ell_{\rm{max}}=3$:
2.178649 seconds (975.66 k allocations: 391.156 MiB, 3.54% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.0008982274748046986, ℓ=2: 9.30223852551783e-5
2.497343 seconds (1.72 M allocations: 737.040 MiB, 4.59% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.0008648703948309034, ℓ=2: 7.984194701288053e-5
#----------------------- Also showing this one at $k = 0.3 ~h/\mathrm{Mpc}$ since dropping multipoles will affect it differently
Output:
$\ell_{\rm{max}}=50$:
8.267421 seconds (1.47 M allocations: 455.798 MiB, 0.66% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 2.1757689130294498e-5, ℓ=2: 3.485634046116244e-6
8.619932 seconds (2.21 M allocations: 800.554 MiB, 1.12% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 6.650277576620795e-6, ℓ=2: 9.456550485422403e-7
$\ell_{\rm{max}}=10$:
9.603415 seconds (1.47 M allocations: 455.069 MiB, 6.76% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 9.358824288586378e-5, ℓ=2: 1.4679040840320153e-5
9.333176 seconds (2.21 M allocations: 800.297 MiB, 1.60% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 7.668981155599474e-5, ℓ=2: 1.2164503995921898e-5
$\ell_{\rm{max}}=3$:
8.938511 seconds (1.46 M allocations: 453.987 MiB, 1.72% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.0005319382002153873, ℓ=2: 3.767904234756597e-5
8.686364 seconds (2.20 M allocations: 800.172 MiB, 1.09% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.0005162221861296847, ℓ=2: 3.567816655348787e-5
Note that the errors quoted are always against a hierarchy using the ell_max specified (which in retrospect maybe is not the most useful number...)
So digging into this comparison a little bit more, it seems like when the switch is early enough (as above, at 1 Mpc), the reduced maximum multipole has really no effect on the subsequent evolution. This is exactly what we expect since high multipoles become more important over time.
To understand for what switch this breaks down a little better, I tried the same comparison for $\ell_{\rm{max}}=3,50$ for a switch of 100 Mpc for both k modes. Here gray is the truncated hierarchy at $\ell_{\rm{max}}=3$ and black is the full hierarchy at $\ell_{\rm{max}}=50$
$k = 0.03$:
Legend is annoyingly covering stuff, but the interesting part of this comparison is before that starts to happen - plus you can see the behavior in the quadrupole.
#-------------------------- $k=0.3$:
zoomed in:
For the lower-$k$ mode we are still in the situation where the truncation of 3 multipoles is good (since this mode enters the horizon so late). However, for the higher $k$ mode we basically mess everything up since we make the transition during the oscillatory part of the evolution where higher multipoles are relevant (evidenced independently of the the iteration by the disagreement of the black and grey lines).
Final massless neutrino test - using a different ansatz:
Here we consider 4 cases:
- The default we have been using - a zero ansatz
- Leave the quadrupole at zero, but set the monopole ansatz to its initial (nonzero) value.
- Use an $\ell_{\rm{max}=3}$ solve at the current cosmology as the ansatz
- Same as the above, but at a Planck15 (no massive neutrino) cosmology - as a stand in for the case where we just read a file at fixed cosmology[^2]
All other parameters are fixed to fiducial for the k =0.03 mode and we look at the result after 2 iterations (though see[^1]).
Output:
0.877503 seconds (392.42 k allocations: 156.517 MiB, 4.13% gc time) (M = 8192, Nᵢ = 2), error against full hierarchy is ℓ=0: 0.0009011986947636854, ℓ=2: 0.00013335847779883907
0.916037 seconds (388.85 k allocations: 156.393 MiB, 3.51% gc time) (M = 8192, Nᵢ = 2), error against full hierarchy is ℓ=0: 6.515467702639128e-5, ℓ=2: 8.8881981073732e-6
1.118537 seconds (393.93 k allocations: 156.512 MiB, 3.68% gc time) (M = 8192, Nᵢ = 2), error against full hierarchy is ℓ=0: 1.5106919541527058e-5, ℓ=2: 2.5628684674265316e-6
0.818163 seconds (390.06 k allocations: 156.535 MiB) (M = 8192, Nᵢ = 2), error against full hierarchy is ℓ=0: 1.49779679851396e-5, ℓ=2: 2.544566397998796e-6
#-----------------------
From looking at this:
- Clearly using the constant non-zero initialization for the monopole ansatz is much better than zero, so we should go with that as the default (and never provide a zero ansatz for the monopole). Note that this improves not just the monopole but also the quadrupole.
- Already by the second iteration we have converged when starting with the Bolt cosmology or Planck cosmology $\ell_{\rm{max}=3}$ solution. This is in line with Kamionkowski++, Ji++ papers. Looking at only 1 iteration, the story is the same, but probably this would change somewhat (though really I suspect not very much) for other cosmologies in $\Lambda CDM$. Exotic extensions are another story.
[^1]: There is a systematic offset for the fiducial $M=8192$ in the first oscillation, but going up by a factor of 2 in $M$ removes this. [^2]: Using $\ell_{\rm{max}}=50$ makes no visual difference at the fiducial parameters for this mode.
Some lessons about numerical parameters:
Number of FFT points $M$
- Increasing the value of $M$ always improves the quality of the solution at a fixed number of iterations. It also slightly increases the runtime, but it is unclear to what extent this will actually matter for the full hierarchy-less solve until we have some code for that for photons, massless/massive neutrinos we can run timings for.
- We need a minimum value of $M$ of at least 8192 to get something within a factor of around 10% of the solution at all times within 5 iterations (of course it is worst at the earliest times). Going up by a factor of 2 visually looks to be basically the hierarchy solution. Going to 7 iterations (not shown) gives a similar quality of solution for $M=8192$.
- The biggest time-sink is the number of iterations, so if we can get away with fewer iterations by using a higher value of $M$, we should do that. Again the extent to which this is true may change later when we truncate everything, but I expect since even the truncated hierarchy solve scales like $N^{3}$, where $N$ is the total number of multipoles across all species, this will dominate over reasonable factors of $M$ increase like those considered here.
- There is a systematic offset even for $M=8192$ that is visible even after several iterations in the first oscillation for k=0.03 (and this persists when I look at up to 9 iterations). So to be safe we may want to use more than this so long as the cost of more FFT points is basically negligible.
Switching from hierarchy to FFT evolution
- Switching somewhat too late induces some initial error that gets suppressed as the time evolution continues (see the plot for k=0.30 Mpc with a switch of 10 Mpc and the k=0.03 Mpc plot with a switch of 100 Mpc), but at early times even with more iterations, this error remains. If we want to get the perturbations right at all times, this means we need to use a sufficiently early switch. We can switch later for longer modes and still be ok.
- Switching far too late produces what is basically garbage - an out-of-phase oscillating solution that never converges to the hierarchy solution (see the plot for the 100 Mpc switch for k=0.30 )
Maximum multipoles and initial ansatzs (ansatzes?)
- We can drop to only 4 multipoles for the full solve before switching with effectively no penalty as long as the switch is early enough.
- Using a non-zero ansatz results in a closer approximation to the hierarchy solution at a fixed number of iterations - specifically using the initial value of the monopole rather than zero. Using a previous solve with $\ell_{\rm{max}}=3$ appears to converge in 1-step for either the current cosmology or a nearby (planck15) cosmology - aka we reproduce the results of previous iterative method papers.
Minor observations:
- I've seen this before, but it is interesting that the hierarchy solution has these jagged numerical looking blips in the oscillatory part of the evolution (e.g. around x=-2 for k=0.03) but we see no such features in the hierarchy. This kind of thing persists even at high tolerances, and was worse when I was solving in x-time instead of conformal time.
- Don't read too much into the timings (beyond a factor of several tens of percent) because they are not average timings.
- In thinking about a horizon-entry-dependent switch, we probably want something like 6-10 times $\eta_{\rm{horizon}}$. This is something to follow-up on in a little more detail.
- One other thing that might be interesting is to see $\ell_{\rm{max}} = 2$ - are we really gaining anything (e.g. BC) in terms of one extra multipole early on?
Now that these experiments are done I will move on to massive neutrinos. I won't repeat all these tests but will post a few quick checks that nothing is broken. In principle, massless neutrinos should be the "worst" for the purposes of the iterative convolution integral since they are the least smooth (have the most high-frequency content) of all the neutrinos, but I could be wrong...
Once that's done, I will:
- [x] Move all the FFT iteration tools into another file
- [x] Build the infrastructure for mixing photon iteration with neutrino iteration
- [ ] Make the "sanity check" into a testsuite test with some tolerance
We might want to pick a tolerance rather than a fixed number of iterations in the future, since it will be a huge waste of time for the user if we take 5 iterations when using a good ansatz that would give convergence in 1 step.
These last few commits remove the monopole and quadrupole entirely from the neutrino solution in the truncated hierarchies (per @marius311's correction of my oversight there). This gets rid of these weird singular matrix and dtmin errors (woo!). The massive neutrino iteration experiments are not working from constant/zero ansatz unless I use an insane amount of FFT points, so will need to spend a little time debugging...
So I'm going to ignore it for now, but the conformal time version of the iteration is not working after the first iteration for some reason. When I use the x-evolution for the truncated hierarchy solve there appears to be no problem at all and the results look quite nice. I didn't see anything like this for the massless neutrinos alone, but when massive neutrinos are included something strange is happening. The iteration error does not get smaller in L2 sense (while it does for x iteration).
Here are some results using the x iteration (like the old massless experiments). Here I use a switch of 5.0 Mpc and $M=8192$ with a constant ansatz. Results are not perfect at early times (this gets better either with a later $\eta$ switch or with more $M$).
Here is q1 for massive neutrinos:
Here is q15 as well:
These last couple are a bit challenging to see in log unfortunately
Here also are the same things but with a Planck cosmology (full) hierarchy, but no neutrino mass, initialization:
massless:
massive q1:
massive qend:
Now (finally) moving on to photon iterative unification....
With these changes, unification is complete! But there are some residual numerical issues with the k=1.0 h/Mpc mode I am looking at before moving on to performance (the sanity check passes for the other two modes we have been looking at k=0.03 and 0.3)
The issue I mentioned with high-k mode probably needs to be investigated more, but there is nothing wrong with the iteration procedure - after sufficiently many iterations the result still converges.
Here also is a somewhat clean "experiment" file for the combined iteration where I started to look at performance.
Added a first look at comparing solvers and the full vs truncated hierarchy times in in the "/scripts/combined_experiment_iter.jl" file
Getting an AD error for Rodas5P on the truncated hierarchy solve though, so will need to check that out....
Also AD fails due to the conformal time interpolator in the conformal solver