firecrown icon indicating copy to clipboard operation
firecrown copied to clipboard

Two-point generators tutorial does not work for real space

Open tilmantroester opened this issue 8 months ago • 7 comments

As pointed out by @arthurmloureiro, adapting the two-point generators tutorial from harmonic to real space gives an error:

import numpy as np
from firecrown.metadata_functions import make_all_photoz_bin_combinations, TwoPointReal

from firecrown.generators.inferred_galaxy_zdist import (
    LSST_Y1_LENS_BIN_COLLECTION,
    LSST_Y1_SOURCE_BIN_COLLECTION,
)

count_bins = LSST_Y1_LENS_BIN_COLLECTION.generate()
shear_bins = LSST_Y1_SOURCE_BIN_COLLECTION.generate()
all_y1_bins = count_bins + shear_bins

all_two_point_xy = make_all_photoz_bin_combinations(all_y1_bins)

thetas = np.geomspace(0.5, 300, 128)
all_two_point_xi = [TwoPointReal(XY=xy, thetas=thetas) for xy in all_two_point_xy]
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[7], line 16
     13 all_two_point_xy = make_all_photoz_bin_combinations(all_y1_bins)
     15 thetas = np.geomspace(0.5, 300, 128)
---> 16 all_two_point_xi = [TwoPointReal(XY=xy, thetas=thetas) for xy in all_two_point_xy]

File <string>:5, in __init__(self, XY, thetas)

File ~/Research/LSST/dev/firecrown/firecrown/metadata_types.py:584, in TwoPointReal.__post_init__(self)
    579     raise ValueError("Thetas should be a 1D array.")
    581 if not measurement_supports_real(
    582     self.XY.x_measurement
    583 ) or not measurement_supports_real(self.XY.y_measurement):
--> 584     raise ValueError(
    585         f"Measurements {self.XY.x_measurement} and "
    586         f"{self.XY.y_measurement} must support real-space calculations."
    587     )

ValueError: Measurements Galaxies.COUNTS and Galaxies.SHEAR_E must support real-space calculations.

I believe the issue is that the there is a specific measurement (SHEAR_E) attached to the shear_bins n(z), which is at best unintuitive, since the n(z) should be the same, whether it's in real or harmonic space. It should also not matter if it's used in GGL, cosmic shear, photometric clustering, etc but I guess at some point some meta information needs to be attached for make_all_photoz_bin_combinations to know what combinations to build.

tilmantroester avatar Apr 08 '25 09:04 tilmantroester

That's because SHEAR_E can only be used by Harmonic Space Measurements. I really do not like that the N(z) metadata needs to know in what that N(z) will be used for.

We need to generate other SRD wrappers so we can do the real space measurements or create them from scratch but then use the correct tracer name that is allowed to be used for real space measurements.

arthurmloureiro avatar Apr 08 '25 09:04 arthurmloureiro

I tried setting up the n(z) and assigning multiple measurements which seems to work:

from firecrown.generators.inferred_galaxy_zdist import  *

def get_lsst_y1_source_bin_collection():
    """Get the LSST Year 1 source bin collection."""
    y1_source_bins = get_y1_source_bins()
    return ZDistLSSTSRDBinCollection(
        alpha=Y1_SOURCE_ALPHA,
        beta=Y1_SOURCE_BETA,
        z0=Y1_SOURCE_Z0,
        bins=[
            ZDistLSSTSRDBin(
                zpl=zpl,
                zpu=zpu,
                sigma_z=y1_source_bins["sigma_z"],
                z=RawGrid1D(values=[0.0, 3.5]),
                bin_name=f"source_{zpl:.1f}_{zpu:.1f}_y1",
                measurements={Galaxies.SHEAR_T, Galaxies.SHEAR_PLUS, Galaxies.SHEAR_MINUS},
            )
            for zpl, zpu in pairwise(y1_source_bins["edges"])
        ],
    )

shear_bins_new = get_lsst_y1_source_bin_collection().generate()

all_y1_bins = count_bins + shear_bins_new

all_two_point_xy = make_all_photoz_bin_combinations(all_y1_bins)

thetas = np.geomspace(0.5, 300, 128)
all_two_point_xi = [TwoPointReal(XY=xy, thetas=thetas) for xy in all_two_point_xy]

Note that it fails if Galaxies.SHEAR_E is in the set even though it physically doesn't matter if the n(z) is for harmonic space or not.

I think a better solution than attaching meta data to the n(z) would be to give some more information to make_all_photoz_bin_combinations on how the bin combinations should be built.

tilmantroester avatar Apr 08 '25 11:04 tilmantroester

The existing values such as LSST_Y1_LENS_BIN_COLLECTION were intended as examples (inspired by the SRD), not as the only way to create bin combinations.

Would it be helpful if we added LSST_Y1_LENS_BIN_COLLECTION_REAL (which does not correspond to anything in the SRD, but which is the natural extrapolation from what is in the SRD for real space analysis)?

We are slightly reluctant to do this because it will perpetuate the misconception that only values pre-defined in Firecrown should be used to set up analysis. This is not at all the intention. Instead, we have provided values we expected to be of fairly broad interest, and the infrastructure for everyone to make whatever combinations are needed for a particular analysis.

marcpaterno avatar Apr 28 '25 20:04 marcpaterno

I think the confusion is due to the fact that the bins and bin combinations know and care about what kind of statistic (real or harmonic space) is being built out of them. This is unintuitive because the tomographic bins and n(z) of the data exist before any statistic is being measured. Since the firecrown statistics are then built using TwoPointReal/TwoPointHarmonic, this is doubly confusing, since the kind of statistic is now defined twice, once implicitly in the bin combinations and once explicitly when building the statistics.

tilmantroester avatar Apr 28 '25 21:04 tilmantroester

@tilmantroester

You're right that conceptually, tomographic bins and redshift distributions exist independently of any particular statistic. However, in practice, once we move to modeling or prediction, we must connect those bins to specific measurements, and those measurements are inherently tied to a statistic (real or harmonic space).

In earlier Firecrown usage (like in the DES Y1 example), this mapping had to be manually handled through naming conventions and explicit branching on types like sacc_data_type. That approach provided flexibility, but it was fragile, verbose, and forced the user to encode modeling logic in terms of string names, introducing opportunities for inconsistency and error.

The current approach avoids this by making the measurement layout explicit through TwoPointXY and TwoPointHarmonic / TwoPointReal. This enables the use of automated factories (TwoPointFactory) that can construct full TwoPoint objects, including source creation and systematics, purely from metadata. This reduces boilerplate and eliminates the need to hard-code type mappings and naming assumptions in user code.

Moreover, in future extensions of the framework (e.g., adding new measurement types), this structure allows for transparent inclusion without requiring users to invent or conform to extended naming conventions. Such conventions can quickly become awkward and hard to maintain, especially when combinatorics grow (e.g., cross-correlations between many probe types).

So while it may seem redundant that the statistic type appears in both the bin combination and the layout object, this redundancy enables a cleaner, scalable, and safer interface for building modeling pipelines.


We are going to post in the Firecrown slack channel a pointer to this issue. If there is a consensus that a change should be made, we can move in this direction.

vitenti avatar May 02 '25 19:05 vitenti

I understand that the meta data on which n(z) goes with which statistic needs to come in somewhere. My concern is that in the example that is most relevant when trying to use firecrown for some simple modelling (the two-point generator tutorial), the bundling of n(z) and statistic happens quietly in the background and causes confusion when approaching the modelling from a physical standpoint. This is mostly a documentation issue I think, and renaming *_BIN_COLLECTION to explicitly include a reference to this being only valid for harmonic space statistics would help to avoid some of the confusion that gave raise to present issue.

A minimal 3x2pt example starting with arrays for the n(z)s, from which some InferredGalaxyZDist are built explicitly with the measurement sets, with make_all_photoz_bin_combinations and producing some model predictions would be useful I believe. Right now all that information is scattered across multiple tutorials.

On a somewhat related note, is it possible to (manually) build a data vector that includes xi_plus, xi_minus, and Cl_EE that share the same systematics? So not necessarily with the automated tools like make_all_photoz_bin_combinations. Such joint analyses of multiple statistics on the same catalogue are useful to check the consistency of different statistics (see e.g. https://arxiv.org/abs/2503.19442)

tilmantroester avatar May 05 '25 12:05 tilmantroester

We will change the names of the constants (and the related functions) in InferredGalaxyZDist.py along the lines of your suggestion.

We have created an issue for the new tutorial you suggest. We have created an issue for the "related note".

marcpaterno avatar May 12 '25 20:05 marcpaterno