PyAutoFit icon indicating copy to clipboard operation
PyAutoFit copied to clipboard

Samples Summary Update

Open Jammy2211 opened this issue 1 year ago • 1 comments

Firstly, the samples_summary.json has information on the model and covariance_matrix which are redundant and should be removed:

https://github.com/rhayes777/PyAutoFit/issues/858

This means a samples_summary.json file will read as follows:

    "type": "instance",
    "class_path": "autofit.non_linear.samples.summary.SamplesSummary",
    "arguments": {
        "max_log_likelihood_sample": {
            "type": "instance",
            "class_path": "autofit.non_linear.samples.sample.Sample",
            "arguments": {
                "log_likelihood": -58.113278119343704,
                "log_prior": 0.040446055595358764,
                "weight": 6.169203512077575e-05,
                "kwargs": {
                    "type": "dict",
                    "arguments": {
                        "centre": 49.8273795385382,
                        "normalization": 24.72428980478263,
                        "sigma": 9.98558962854544
                    }
                }
            }
        }
}

The purpose of the SamplesSummary is to provide quick asses to key results on the samples, which avoid needing to perform potentially expensive calculations (e.g. many samples and complex models) to acess.

The SamplesSummary itself is stored in this class:

class SamplesSummary(SamplesInterface):
    def __init__(
        self,
        max_log_likelihood_sample: Sample,
        model: AbstractPriorModel,
        covariance_matrix: Optional[np.ndarray] = None,
        log_evidence: Optional[float] = None,
    ):
        """
        A summary of the results of a `NonLinearSearch` that has been run, including the maximum log likelihood

        Parameters
        ----------
        max_log_likelihood_sample
            The parameters from a non-linear search that gave the highest likelihood
        model
            A model used to map the samples to physical values
        covariance_matrix
            The covariance matrix of the samples
        """
        super().__init__(model=model)
        self._max_log_likelihood_sample = max_log_likelihood_sample
        self.covariance_matrix = covariance_matrix
        self._log_evidence = log_evidence
        self.derived_summary = None

    @property
    def max_log_likelihood_sample(self):
        return self._max_log_likelihood_sample

    @property
    def log_evidence(self):
        return self._log_evidence

This issue about expanding this file to contain a lot more information on samples.

Firstly, we should include the median_pdf_sample, which is the sample corresponding to the median PDF result, which is computed in SamplesPDF:

    @to_instance
    def median_pdf(self) -> List[float]:
        """
        The median of the probability density function (PDF) of every parameter marginalized in 1D, returned
        as a model instance or list of values.
        """
        if self.pdf_converged:
            return [
                quantile(x=params, q=0.5, weights=self.weight_list)[0]
                for params in self.parameters_extract
            ]
        return self.max_log_likelihood(as_instance=False)

We should also include errors, which are computed as a tuple via the following function:

    @to_instance
    def errors_at_sigma(
        self, sigma: float, as_instance: bool = True
    ) -> [Tuple, ModelInstance]:
        """
        The lower and upper error of every parameter marginalized in 1D at an input sigma value of its probability
        density function (PDF), returned as a list.

        See values_at_sigma for a full description of how the parameters at sigma are computed.

        Parameters
        ----------
        sigma
            The sigma within which the PDF is used to estimate errors (e.g. sigma = 1.0 uses 0.6826 of the PDF).
        """
        error_vector_lower = self.errors_at_lower_sigma(sigma=sigma, as_instance=False)
        error_vector_upper = self.errors_at_upper_sigma(sigma=sigma, as_instance=False)
        return [
            (lower, upper)
            for lower, upper in zip(error_vector_lower, error_vector_upper)
        ]

The errors depend on the sigma value input (e.g. larger sigma gives larger error). I think we should store 2 error tuple lists in samples summary:

  • errors_at_sigma_1
  • errors_at_sigma_3

This means a user can easily check the size of the errors on each parameter.

We should also include the values of parameters at errors:

    @to_instance
    def values_at_sigma(self, sigma: float) -> [Tuple, ModelInstance]:
        """
        The value of every parameter marginalized in 1D at an input sigma value of its probability density function
        (PDF), returned as two lists of values corresponding to the lower and upper values parameter values.

        For example, if sigma is 1.0, the marginalized values of every parameter at 31.7% and 68.2% percentiles of each
        PDF is returned.

        This does not account for covariance between parameters. For example, if two parameters (x, y) are degenerate
        whereby x decreases as y gets larger to give the same PDF, this function will still return both at their
        upper values. Thus, caution is advised when using the function to reperform a model-fits.

        This is estimated using the `quantile` function if the samples have converged, by sampling the density
        function at an input PDF %. If not converged, a crude estimate using the range of values of the current
        physical live points is used.

        Parameters
        ----------
        sigma
            The sigma within which the PDF is used to estimate errors (e.g. sigma = 1.0 uses 0.6826 of the PDF).
        """

        if self.pdf_converged:
            low_limit = (1 - math.erf(sigma / math.sqrt(2))) / 2

            lower_errors = [
                quantile(x=params, q=low_limit, weights=self.weight_list)[0]
                for params in self.parameters_extract
            ]
            upper_errors = [
                quantile(x=params, q=1 - low_limit, weights=self.weight_list)[0]
                for params in self.parameters_extract
            ]

            return [(lower, upper) for lower, upper in zip(lower_errors, upper_errors)]

        parameters_min = list(
            np.min(self.parameter_lists[-self.unconverged_sample_size :], axis=0)
        )
        parameters_max = list(
            np.max(self.parameter_lists[-self.unconverged_sample_size :], axis=0)
        )

        return [
            (parameters_min[index], parameters_max[index])
            for index in range(len(parameters_min))
        ]

Again, there will be two entires:

  • values_at_sigma_1
  • values_at_sigma_3

Finally, the output.yaml file should allow one to customize the samples_summary.json file:

https://github.com/rhayes777/PyAutoFit/blob/main/autofit/config/output.yaml

max_log_likelihood: true
median_pdf: true
values_at_sigma_1: true
values_at_sigma_3: false
errors_at_sigma_1: true
errors_at_sigma_3: false

Jammy2211 avatar Mar 20 '24 18:03 Jammy2211

This PR https://github.com/rhayes777/PyAutoFit/pull/981 has made some changes to SamplesSummary:

  • Removed covariance_matrix.
  • Removed model (and loads it via the .json in certain circumstances).
  • Added median_pdf.

Lets chat next week about all this...

Jammy2211 avatar Mar 22 '24 16:03 Jammy2211

The remaining task is to extend samples summary with the following quantitieS:

  • values_at_sigma_1
  • values_at_sigma_3

Jammy2211 avatar Apr 17 '24 18:04 Jammy2211