Bolt.jl icon indicating copy to clipboard operation
Bolt.jl copied to clipboard

Ie

Open jmsull opened this issue 3 years ago • 23 comments

Toward a hierarchy-less version - eventually want to compare speed against hierarchy.

jmsull avatar Feb 05 '22 07:02 jmsull

Codecov Report

Merging #57 (edbaa86) into main (de08e4d) will decrease coverage by 21.71%. The diff coverage is 0.28%.

:mega: This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more

@@             Coverage Diff             @@
##             main      #57       +/-   ##
===========================================
- Coverage   46.09%   24.39%   -21.71%     
===========================================
  Files           7        8        +1     
  Lines         692     1279      +587     
===========================================
- Hits          319      312        -7     
- Misses        373      967      +594     
Impacted Files Coverage Δ
src/Bolt.jl 100.00% <ø> (ø)
src/ie.jl 0.00% <0.00%> (ø)
src/ionization/ionization.jl 22.03% <ø> (-2.56%) :arrow_down:
src/ionization/recfast.jl 91.78% <ø> (-0.08%) :arrow_down:
src/perturbations.jl 0.00% <0.00%> (ø)
src/spectra.jl 0.00% <0.00%> (ø)
src/background.jl 80.00% <28.57%> (-6.37%) :arrow_down:

... and 1 file with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov-commenter avatar Feb 05 '22 07:02 codecov-commenter

About to add a cleaned up first scipt version (ie_single_k.jl) that (finally!) works. It is slow and is only tested for a single relatively large k (~0.03 h/Mpc), but it works!!

In action with a starting guess of zero (for quadrupole parts) on the photon monopole: ie_temp_mono

Upcoming improvements:

  • [ ] clean up the iteration code
  • [x] try different (namely, higher) k
  • [ ] set some scheme for the time integration points that is more efficient than equispaced in x (to go faster)
  • [ ] speed up the iteration (+ better initial guess for Pi)

jmsull avatar Jun 02 '22 20:06 jmsull

The code I pushed has the file I described above, which uses the equations here (but in conformal newton gauge as I wrote out in the notes).

I also turned the RSA off in ie.jl (which is a file that only evolves the low-order moments of the photons and uses the splines for quadrupoles).

Some notes about this last point:

RSA serves two purposes:

  1. Give the solver less computational work to do by reducing the number of ODEs that need to be solved at late times.
  2. Prevent boundary condition reflection blow up of photon perturbations.

The latter is entirely a numerical artefact of the hierarchy - with the ie solution we can properly resolve photon oscillations without the boundary issue. While aesthetically pleasing to say we don't need to make the RSA assumption (for a finite number of multipoles), it is probably desirable from a computational standpoint to continue using RSA. But we no longer absolutely have to to get a correct answer (and I wonder if there are any science cases for late-time radiation-like perturbations...).

jmsull avatar Jun 02 '22 21:06 jmsull

This PR is at a very exciting step! I'm finding your code to be very clear, especially for a first draft.

For my own understanding, I still need to do a bit of derivation to go from the hierarchy to the IE, as you've done in your notes!

xzackli avatar Jun 03 '22 02:06 xzackli

Going to start posting some numerical experiment results (plots/timings/norms) here for the massless neutrino FFT dependence on numerical parameters. (Everything here is in conformal time integration, though I don't find much of a difference between stepping in $\eta$ and $x$ in overall accuracy).

Varying:

  • Full-to-truncated switch $\eta_{\mathrm{switch}}$
  • Number of FFT points $M$
  • Number of iterations $N_{I}$

Also will look at truncating the number of multipoles in the initial solve and using a better initial ansatz, as by default here I will use $\ell_{\nu} = 50$ in the initial full hierarchy and the dumbest ansatz, which is an array of all zeros.

Timings should be taken with a grain of salt because the FFT code is 1. Unoptimized and 2. Still carrying the weight of the full hierarchy for the photons and massive neutrinos, but I think are still useful for a rough indication at least in the tradeoff of $M$ vs $N_{I}$.

jmsull avatar Feb 09 '23 17:02 jmsull

Fiducial values of the main parameters (fixed to this unless otherwise stated): $\eta_{\mathrm{switch}} = 1 \mathrm{Mpc}$, $M=8192$, $N_{I}=5$. And right now this is all for $k=0.03 h/\mathrm{Mpc}$.

Vary $M$ and $N_{I}$ at fixed switch

mslss_k0 03_switch1 1_elmax50_zeroini

Black is the hierarchy. Clearly more M helps. Here is some summary output:

3.914403 seconds (621.16 k allocations: 219.230 MiB, 15.97% gc time) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.024327736624258293, ℓ=2: 0.004968625571585811

3.337983 seconds (975.75 k allocations: 391.330 MiB, 1.96% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 5.628201856013467e-5, ℓ=2: 9.69840450718242e-6

3.989557 seconds (1.71 M allocations: 736.494 MiB, 5.39% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 9.016324731957222e-6, ℓ=2: 1.6088676485945645e-6

Without any look at the flamegraph it seems like most of the time is not going into the FFT, since increasing M doesn't affect the walltime (which, note, is @time and not @btime), even though it affects the accuracy.

Note here that the error is sum( (iter-hierarchy)^2 /M )

jmsull avatar Feb 10 '23 08:02 jmsull

Here is the same test at higher $k = 0.3 ~h/\mathrm{Mpc}$:

mslss_k0 30_switch1 1_elmax50_zeroini

14.687309 seconds (3.66 M allocations: 420.126 MiB, 0.61% gc time, 14.60% compilation time) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.0027027526587004207, ℓ=2: 0.0005436380175114952

12.744275 seconds (1.48 M allocations: 456.152 MiB, 0.79% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 1.7713887051272318e-5, ℓ=2: 2.6232322357963993e-6

12.747244 seconds (2.20 M allocations: 799.949 MiB, 1.47% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 6.691494413715124e-6, ℓ=2: 9.531342334122629e-7

Obviously this takes much longer (but we might explore relaxing the tolerance?). Either way convergence results are qualitatively similar (error is actually lower than for lower k).

Again most of the time is probably spent in the truncated ODE solve, and I would imagine dropping the photon and massive neutrino hierarchies will help down the line.

To follow up on this, the time for the solutions of the "late" full hierarchy is 3.097427 seconds (91.47 k allocations: 16.091 MiB) while the "early" time is less than half a second.

Each truncated solve costs around 2.4 seconds 2.357683 seconds (115.19 k allocations: 17.830 MiB)

  • which checks out with the 5 iter time of 12.7ish above.

So we are saving only around 30% of the runtime by truncating the neutrinos. Will revisit these timings again later, but this helps (at least me) get oriented.

jmsull avatar Feb 10 '23 08:02 jmsull

Vary the switch and $M$ at fixed $N_{I}$

Here $N_{I}$ is fixed to 5 and I vary the switch from $1 \mathrm{Mpc}$ to [0.5,10.,100.] $\mathrm{Mpc}$. $M$ is varied as above.

For reference, for this k-mode, the "horizon scale" (for which there is no obvious definition) is where $\frac{k}{\mathcal{H}}=1$ is $\eta=37 \mathrm{Mpc}$. Note that the place where the variation in the monopole transitions from exponential to linear happens after this...

mslss_k0 03_switch0 5_elmax50_zeroini

Output:

2.477748 seconds (640.01 k allocations: 220.874 MiB, 1.51% gc time) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 44.97438381582731, ℓ=2: 9.615684393834469

2.272631 seconds (996.48 k allocations: 391.946 MiB, 3.80% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.05185301616318226, ℓ=2: 0.010879956079524676

2.579722 seconds (1.72 M allocations: 737.093 MiB, 6.44% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 9.873425689145687e-5, ℓ=2: 1.799027061434055e-5

mslss_k0 03_switch1 0_elmax50_zeroini

Output:

2.211860 seconds (618.90 k allocations: 218.756 MiB, 2.60% gc time) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.05024428535457524, ℓ=2: 0.010362831120741907

2.343475 seconds (979.42 k allocations: 391.376 MiB, 5.47% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 9.89168854707918e-5, ℓ=2: 1.764483646486915e-5

2.749802 seconds (1.72 M allocations: 736.774 MiB, 5.79% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 9.303373602457604e-6, ℓ=2: 1.617072960746139e-6

mslss_k0 03_switch10 0_elmax50_zeroini

Output:

2.489212 seconds (591.94 k allocations: 217.901 MiB, 1.63% gc time) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 1.0948808134474916e-5, ℓ=2: 2.0282281959209817e-6

3.096040 seconds (963.15 k allocations: 390.454 MiB, 3.00% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 9.158093237790724e-6, ℓ=2: 1.7164577184716518e-6

2.870788 seconds (1.70 M allocations: 735.816 MiB, 6.01% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 8.752670467463331e-6, ℓ=2: 1.6362669057618164e-6

mslss_k0 03_switch100 0_elmax50_zeroini

Output:

1.982466 seconds (567.07 k allocations: 216.046 MiB, 2.08% gc time) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 6.783373104924727e-5, ℓ=2: 3.2583090867381304e-5

2.086790 seconds (931.32 k allocations: 388.667 MiB, 3.72% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 6.71768974129022e-5, ℓ=2: 3.0038297397818197e-5

2.421192 seconds (1.67 M allocations: 733.976 MiB, 6.74% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 6.763186632413128e-5, ℓ=2: 2.8919112145994086e-5

#--------

Again, here is the same thing for the higher-$k$ mode ($k=0.03~h/\mathrm{Mpc}, "horizon scale" 3.4 Mpc): mslss_k0 30_switch0 5_elmax50_zeroini

Output:

9.759966 seconds (1.10 M allocations: 281.695 MiB) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.0021914827906542155, ℓ=2: 0.0004347068032540287

8.746645 seconds (1.49 M allocations: 455.735 MiB, 5.16% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 2.4660373189609712e-5, ℓ=2: 2.686193755159691e-6

7.930377 seconds (2.20 M allocations: 799.368 MiB, 1.31% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 8.324277287684964e-6, ℓ=2: 1.0597585399984886e-6

mslss_k0 30_switch1 0_elmax50_zeroini

Output:

8.164170 seconds (1.11 M allocations: 282.128 MiB, 0.80% gc time) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 5.9543352472409525e-5, ℓ=2: 7.192616175038779e-6

8.267998 seconds (1.45 M allocations: 453.342 MiB, 1.47% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 1.5593479481400864e-5, ℓ=2: 1.9421464632261884e-6

8.791901 seconds (2.20 M allocations: 798.750 MiB, 1.27% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 6.781303864436087e-6, ℓ=2: 9.661082440228582e-7

mslss_k0 30_switch10 0_elmax50_zeroini

Output:

8.845261 seconds (1.06 M allocations: 279.297 MiB, 0.77% gc time) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 3.380758869776408e-5, ℓ=2: 9.897320474102225e-6

8.691568 seconds (1.44 M allocations: 452.829 MiB, 0.70% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 1.416734005629872e-5, ℓ=2: 5.76125185991097e-6

9.181634 seconds (2.18 M allocations: 798.308 MiB, 1.96% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 9.540303946926921e-6, ℓ=2: 4.245814075267681e-6

mslss_k0 30_switch100 0_elmax50_zeroini

Output:

8.330425 seconds (1.05 M allocations: 278.212 MiB, 0.73% gc time) (M = 4096, Nᵢ = 5), error against full hierarchy is ℓ=0: 6.854331049445173e-6, ℓ=2: 2.56038594633846e-6

8.464808 seconds (1.42 M allocations: 451.609 MiB, 0.65% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 6.4147161848718985e-6, ℓ=2: 2.4778285049901822e-6

8.088516 seconds (2.16 M allocations: 796.813 MiB, 2.21% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 6.309153736510417e-6, ℓ=2: 2.4354245288514206e-6

Walltime decreases as we start later (makes sense, less ODE to solve). The error metric also decreases as we start later. At face value this makes no sense, but looking at the plots for later switches (if you squint) you can see that this is because the later starts only have to do the FFT in the easy-to-FFT nearly flat wiggle zone, and I am only measuring the error where we actually do the FFT.

jmsull avatar Feb 10 '23 09:02 jmsull

Hmm I may need to fix that first plot for $k=0.03$ but it should agree quite well for higher $M$

jmsull avatar Feb 10 '23 09:02 jmsull

Here is that first $k=0.03$ plot at higher $M$ where the range of the plot is smaller (though only fair comparison with others is the lowest $M$ here vs highest $M$ in other plots) mslss_k0 03_switch0 5_elmax50_zeroini

jmsull avatar Feb 12 '23 19:02 jmsull

Remainder of the tests outlined above:

Drop the number of multipoles in the initial hierarchy

Here I show $\ell_{\nu} = 3, 10, 50$, all other plots use 50. Main takeaway is that at fixed $M$ there is basically no difference in using fewer multipoles (which physically makes sense for this low a switch time).

Annoyingly this first plot got a little cut off...

mslss_k0 03_switch1 0_elmax31050_zeroini

Outputs:

$\ell_{\rm{max}}=50$:

4.783524 seconds (1.23 M allocations: 415.648 MiB, 2.18% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 9.840033848574503e-5, ℓ=2: 1.828074841420805e-5

4.868371 seconds (1.94 M allocations: 759.623 MiB, 2.08% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 8.84161724083902e-6, ℓ=2: 1.632847426010087e-6

$\ell_{\rm{max}}=10$:

4.877348 seconds (1.23 M allocations: 415.077 MiB, 1.63% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.0004217199376935739, ℓ=2: 6.547150731140973e-5

5.011039 seconds (1.96 M allocations: 760.168 MiB, 1.97% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.00024554041721549364, ℓ=2: 4.880646789688861e-5

$\ell_{\rm{max}}=3$:

2.178649 seconds (975.66 k allocations: 391.156 MiB, 3.54% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.0008982274748046986, ℓ=2: 9.30223852551783e-5

2.497343 seconds (1.72 M allocations: 737.040 MiB, 4.59% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.0008648703948309034, ℓ=2: 7.984194701288053e-5

#----------------------- Also showing this one at $k = 0.3 ~h/\mathrm{Mpc}$ since dropping multipoles will affect it differently

mslss_k0 30_switch1 0_elmax31050_zeroini

Output:

$\ell_{\rm{max}}=50$:

8.267421 seconds (1.47 M allocations: 455.798 MiB, 0.66% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 2.1757689130294498e-5, ℓ=2: 3.485634046116244e-6

8.619932 seconds (2.21 M allocations: 800.554 MiB, 1.12% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 6.650277576620795e-6, ℓ=2: 9.456550485422403e-7

$\ell_{\rm{max}}=10$:

9.603415 seconds (1.47 M allocations: 455.069 MiB, 6.76% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 9.358824288586378e-5, ℓ=2: 1.4679040840320153e-5

9.333176 seconds (2.21 M allocations: 800.297 MiB, 1.60% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 7.668981155599474e-5, ℓ=2: 1.2164503995921898e-5

$\ell_{\rm{max}}=3$:

8.938511 seconds (1.46 M allocations: 453.987 MiB, 1.72% gc time) (M = 8192, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.0005319382002153873, ℓ=2: 3.767904234756597e-5

8.686364 seconds (2.20 M allocations: 800.172 MiB, 1.09% gc time) (M = 16384, Nᵢ = 5), error against full hierarchy is ℓ=0: 0.0005162221861296847, ℓ=2: 3.567816655348787e-5

Note that the errors quoted are always against a hierarchy using the ell_max specified (which in retrospect maybe is not the most useful number...)

jmsull avatar Feb 12 '23 21:02 jmsull

So digging into this comparison a little bit more, it seems like when the switch is early enough (as above, at 1 Mpc), the reduced maximum multipole has really no effect on the subsequent evolution. This is exactly what we expect since high multipoles become more important over time.

To understand for what switch this breaks down a little better, I tried the same comparison for $\ell_{\rm{max}}=3,50$ for a switch of 100 Mpc for both k modes. Here gray is the truncated hierarchy at $\ell_{\rm{max}}=3$ and black is the full hierarchy at $\ell_{\rm{max}}=50$

$k = 0.03$:

mslss_k0 03_switch100 0_elmax350_zeroini

Legend is annoyingly covering stuff, but the interesting part of this comparison is before that starts to happen - plus you can see the behavior in the quadrupole.

#-------------------------- $k=0.3$:

mslss_k0 30_switch100 0_elmax350_zeroini

zoomed in:

mslss_k0 30_switch100 0_elmax350_zeroini_zoom

jmsull avatar Feb 12 '23 21:02 jmsull

For the lower-$k$ mode we are still in the situation where the truncation of 3 multipoles is good (since this mode enters the horizon so late). However, for the higher $k$ mode we basically mess everything up since we make the transition during the oscillatory part of the evolution where higher multipoles are relevant (evidenced independently of the the iteration by the disagreement of the black and grey lines).

jmsull avatar Feb 12 '23 21:02 jmsull

Final massless neutrino test - using a different ansatz:

Here we consider 4 cases:

  1. The default we have been using - a zero ansatz
  2. Leave the quadrupole at zero, but set the monopole ansatz to its initial (nonzero) value.
  3. Use an $\ell_{\rm{max}=3}$ solve at the current cosmology as the ansatz
  4. Same as the above, but at a Planck15 (no massive neutrino) cosmology - as a stand in for the case where we just read a file at fixed cosmology[^2]

All other parameters are fixed to fiducial for the k =0.03 mode and we look at the result after 2 iterations (though see[^1]).

mslss_k0 03_switch1 0_elmax3_varyansatz

Output:

0.877503 seconds (392.42 k allocations: 156.517 MiB, 4.13% gc time) (M = 8192, Nᵢ = 2), error against full hierarchy is ℓ=0: 0.0009011986947636854, ℓ=2: 0.00013335847779883907

0.916037 seconds (388.85 k allocations: 156.393 MiB, 3.51% gc time) (M = 8192, Nᵢ = 2), error against full hierarchy is ℓ=0: 6.515467702639128e-5, ℓ=2: 8.8881981073732e-6

1.118537 seconds (393.93 k allocations: 156.512 MiB, 3.68% gc time) (M = 8192, Nᵢ = 2), error against full hierarchy is ℓ=0: 1.5106919541527058e-5, ℓ=2: 2.5628684674265316e-6

0.818163 seconds (390.06 k allocations: 156.535 MiB) (M = 8192, Nᵢ = 2), error against full hierarchy is ℓ=0: 1.49779679851396e-5, ℓ=2: 2.544566397998796e-6

#-----------------------

From looking at this:

  • Clearly using the constant non-zero initialization for the monopole ansatz is much better than zero, so we should go with that as the default (and never provide a zero ansatz for the monopole). Note that this improves not just the monopole but also the quadrupole.
  • Already by the second iteration we have converged when starting with the Bolt cosmology or Planck cosmology $\ell_{\rm{max}=3}$ solution. This is in line with Kamionkowski++, Ji++ papers. Looking at only 1 iteration, the story is the same, but probably this would change somewhat (though really I suspect not very much) for other cosmologies in $\Lambda CDM$. Exotic extensions are another story.

[^1]: There is a systematic offset for the fiducial $M=8192$ in the first oscillation, but going up by a factor of 2 in $M$ removes this. [^2]: Using $\ell_{\rm{max}}=50$ makes no visual difference at the fiducial parameters for this mode.

jmsull avatar Feb 13 '23 00:02 jmsull

Some lessons about numerical parameters:

Number of FFT points $M$

  1. Increasing the value of $M$ always improves the quality of the solution at a fixed number of iterations. It also slightly increases the runtime, but it is unclear to what extent this will actually matter for the full hierarchy-less solve until we have some code for that for photons, massless/massive neutrinos we can run timings for.
  2. We need a minimum value of $M$ of at least 8192 to get something within a factor of around 10% of the solution at all times within 5 iterations (of course it is worst at the earliest times). Going up by a factor of 2 visually looks to be basically the hierarchy solution. Going to 7 iterations (not shown) gives a similar quality of solution for $M=8192$.
  3. The biggest time-sink is the number of iterations, so if we can get away with fewer iterations by using a higher value of $M$, we should do that. Again the extent to which this is true may change later when we truncate everything, but I expect since even the truncated hierarchy solve scales like $N^{3}$, where $N$ is the total number of multipoles across all species, this will dominate over reasonable factors of $M$ increase like those considered here.
  4. There is a systematic offset even for $M=8192$ that is visible even after several iterations in the first oscillation for k=0.03 (and this persists when I look at up to 9 iterations). So to be safe we may want to use more than this so long as the cost of more FFT points is basically negligible.

Switching from hierarchy to FFT evolution

  1. Switching somewhat too late induces some initial error that gets suppressed as the time evolution continues (see the plot for k=0.30 Mpc with a switch of 10 Mpc and the k=0.03 Mpc plot with a switch of 100 Mpc), but at early times even with more iterations, this error remains. If we want to get the perturbations right at all times, this means we need to use a sufficiently early switch. We can switch later for longer modes and still be ok.
  2. Switching far too late produces what is basically garbage - an out-of-phase oscillating solution that never converges to the hierarchy solution (see the plot for the 100 Mpc switch for k=0.30 )

Maximum multipoles and initial ansatzs (ansatzes?)

  1. We can drop to only 4 multipoles for the full solve before switching with effectively no penalty as long as the switch is early enough.
  2. Using a non-zero ansatz results in a closer approximation to the hierarchy solution at a fixed number of iterations - specifically using the initial value of the monopole rather than zero. Using a previous solve with $\ell_{\rm{max}}=3$ appears to converge in 1-step for either the current cosmology or a nearby (planck15) cosmology - aka we reproduce the results of previous iterative method papers.

Minor observations:

  • I've seen this before, but it is interesting that the hierarchy solution has these jagged numerical looking blips in the oscillatory part of the evolution (e.g. around x=-2 for k=0.03) but we see no such features in the hierarchy. This kind of thing persists even at high tolerances, and was worse when I was solving in x-time instead of conformal time.
  • Don't read too much into the timings (beyond a factor of several tens of percent) because they are not average timings.
  • In thinking about a horizon-entry-dependent switch, we probably want something like 6-10 times $\eta_{\rm{horizon}}$. This is something to follow-up on in a little more detail.
  • One other thing that might be interesting is to see $\ell_{\rm{max}} = 2$ - are we really gaining anything (e.g. BC) in terms of one extra multipole early on?

jmsull avatar Feb 13 '23 00:02 jmsull

Now that these experiments are done I will move on to massive neutrinos. I won't repeat all these tests but will post a few quick checks that nothing is broken. In principle, massless neutrinos should be the "worst" for the purposes of the iterative convolution integral since they are the least smooth (have the most high-frequency content) of all the neutrinos, but I could be wrong...

Once that's done, I will:

  • [x] Move all the FFT iteration tools into another file
  • [x] Build the infrastructure for mixing photon iteration with neutrino iteration
  • [ ] Make the "sanity check" into a testsuite test with some tolerance

We might want to pick a tolerance rather than a fixed number of iterations in the future, since it will be a huge waste of time for the user if we take 5 iterations when using a good ansatz that would give convergence in 1 step.

jmsull avatar Feb 13 '23 00:02 jmsull

These last few commits remove the monopole and quadrupole entirely from the neutrino solution in the truncated hierarchies (per @marius311's correction of my oversight there). This gets rid of these weird singular matrix and dtmin errors (woo!). The massive neutrino iteration experiments are not working from constant/zero ansatz unless I use an insane amount of FFT points, so will need to spend a little time debugging...

jmsull avatar Mar 28 '23 05:03 jmsull

So I'm going to ignore it for now, but the conformal time version of the iteration is not working after the first iteration for some reason. When I use the x-evolution for the truncated hierarchy solve there appears to be no problem at all and the results look quite nice. I didn't see anything like this for the massless neutrinos alone, but when massive neutrinos are included something strange is happening. The iteration error does not get smaller in L2 sense (while it does for x iteration).

Here are some results using the x iteration (like the old massless experiments). Here I use a switch of 5.0 Mpc and $M=8192$ with a constant ansatz. Results are not perfect at early times (this gets better either with a later $\eta$ switch or with more $M$).

mssv_k0 03_q0_allconst_xiter_switch5 0

Here is q1 for massive neutrinos: mssv_k0 03_q1log_allconst_xiter_switch5 0

Here is q15 as well: mssv_k0 03_qendlog_allconst_xiter_switch5 0

These last couple are a bit challenging to see in log unfortunately

jmsull avatar Mar 28 '23 19:03 jmsull

Here also are the same things but with a Planck cosmology (full) hierarchy, but no neutrino mass, initialization:

massless: mssv_k0 03_q0_plancknonu_xiter_switch5 0

massive q1: mssv_k0 03_q1log_plancknonu_xiter_switch5 0

massive qend: mssv_k0 03_qendlog_plancknonu_xiter_switch5 0

jmsull avatar Mar 28 '23 20:03 jmsull

Now (finally) moving on to photon iterative unification....

jmsull avatar Mar 28 '23 20:03 jmsull

With these changes, unification is complete! But there are some residual numerical issues with the k=1.0 h/Mpc mode I am looking at before moving on to performance (the sanity check passes for the other two modes we have been looking at k=0.03 and 0.3)

jmsull avatar Apr 04 '23 17:04 jmsull

The issue I mentioned with high-k mode probably needs to be investigated more, but there is nothing wrong with the iteration procedure - after sufficiently many iterations the result still converges.

Here also is a somewhat clean "experiment" file for the combined iteration where I started to look at performance.

jmsull avatar Apr 06 '23 20:04 jmsull

Added a first look at comparing solvers and the full vs truncated hierarchy times in in the "/scripts/combined_experiment_iter.jl" file

Getting an AD error for Rodas5P on the truncated hierarchy solve though, so will need to check that out....

Also AD fails due to the conformal time interpolator in the conformal solver

jmsull avatar Apr 11 '23 19:04 jmsull