PyAutoFit
PyAutoFit copied to clipboard
Samples Summary Update
Firstly, the samples_summary.json has information on the model and covariance_matrix which are redundant and should be removed:
https://github.com/rhayes777/PyAutoFit/issues/858
This means a samples_summary.json file will read as follows:
"type": "instance",
"class_path": "autofit.non_linear.samples.summary.SamplesSummary",
"arguments": {
"max_log_likelihood_sample": {
"type": "instance",
"class_path": "autofit.non_linear.samples.sample.Sample",
"arguments": {
"log_likelihood": -58.113278119343704,
"log_prior": 0.040446055595358764,
"weight": 6.169203512077575e-05,
"kwargs": {
"type": "dict",
"arguments": {
"centre": 49.8273795385382,
"normalization": 24.72428980478263,
"sigma": 9.98558962854544
}
}
}
}
}
The purpose of the SamplesSummary is to provide quick asses to key results on the samples, which avoid needing to perform potentially expensive calculations (e.g. many samples and complex models) to acess.
The SamplesSummary itself is stored in this class:
class SamplesSummary(SamplesInterface):
def __init__(
self,
max_log_likelihood_sample: Sample,
model: AbstractPriorModel,
covariance_matrix: Optional[np.ndarray] = None,
log_evidence: Optional[float] = None,
):
"""
A summary of the results of a `NonLinearSearch` that has been run, including the maximum log likelihood
Parameters
----------
max_log_likelihood_sample
The parameters from a non-linear search that gave the highest likelihood
model
A model used to map the samples to physical values
covariance_matrix
The covariance matrix of the samples
"""
super().__init__(model=model)
self._max_log_likelihood_sample = max_log_likelihood_sample
self.covariance_matrix = covariance_matrix
self._log_evidence = log_evidence
self.derived_summary = None
@property
def max_log_likelihood_sample(self):
return self._max_log_likelihood_sample
@property
def log_evidence(self):
return self._log_evidence
This issue about expanding this file to contain a lot more information on samples.
Firstly, we should include the median_pdf_sample, which is the sample corresponding to the median PDF result, which is computed in SamplesPDF:
@to_instance
def median_pdf(self) -> List[float]:
"""
The median of the probability density function (PDF) of every parameter marginalized in 1D, returned
as a model instance or list of values.
"""
if self.pdf_converged:
return [
quantile(x=params, q=0.5, weights=self.weight_list)[0]
for params in self.parameters_extract
]
return self.max_log_likelihood(as_instance=False)
We should also include errors, which are computed as a tuple via the following function:
@to_instance
def errors_at_sigma(
self, sigma: float, as_instance: bool = True
) -> [Tuple, ModelInstance]:
"""
The lower and upper error of every parameter marginalized in 1D at an input sigma value of its probability
density function (PDF), returned as a list.
See values_at_sigma for a full description of how the parameters at sigma are computed.
Parameters
----------
sigma
The sigma within which the PDF is used to estimate errors (e.g. sigma = 1.0 uses 0.6826 of the PDF).
"""
error_vector_lower = self.errors_at_lower_sigma(sigma=sigma, as_instance=False)
error_vector_upper = self.errors_at_upper_sigma(sigma=sigma, as_instance=False)
return [
(lower, upper)
for lower, upper in zip(error_vector_lower, error_vector_upper)
]
The errors depend on the sigma value input (e.g. larger sigma gives larger error). I think we should store 2 error tuple lists in samples summary:
- errors_at_sigma_1
- errors_at_sigma_3
This means a user can easily check the size of the errors on each parameter.
We should also include the values of parameters at errors:
@to_instance
def values_at_sigma(self, sigma: float) -> [Tuple, ModelInstance]:
"""
The value of every parameter marginalized in 1D at an input sigma value of its probability density function
(PDF), returned as two lists of values corresponding to the lower and upper values parameter values.
For example, if sigma is 1.0, the marginalized values of every parameter at 31.7% and 68.2% percentiles of each
PDF is returned.
This does not account for covariance between parameters. For example, if two parameters (x, y) are degenerate
whereby x decreases as y gets larger to give the same PDF, this function will still return both at their
upper values. Thus, caution is advised when using the function to reperform a model-fits.
This is estimated using the `quantile` function if the samples have converged, by sampling the density
function at an input PDF %. If not converged, a crude estimate using the range of values of the current
physical live points is used.
Parameters
----------
sigma
The sigma within which the PDF is used to estimate errors (e.g. sigma = 1.0 uses 0.6826 of the PDF).
"""
if self.pdf_converged:
low_limit = (1 - math.erf(sigma / math.sqrt(2))) / 2
lower_errors = [
quantile(x=params, q=low_limit, weights=self.weight_list)[0]
for params in self.parameters_extract
]
upper_errors = [
quantile(x=params, q=1 - low_limit, weights=self.weight_list)[0]
for params in self.parameters_extract
]
return [(lower, upper) for lower, upper in zip(lower_errors, upper_errors)]
parameters_min = list(
np.min(self.parameter_lists[-self.unconverged_sample_size :], axis=0)
)
parameters_max = list(
np.max(self.parameter_lists[-self.unconverged_sample_size :], axis=0)
)
return [
(parameters_min[index], parameters_max[index])
for index in range(len(parameters_min))
]
Again, there will be two entires:
- values_at_sigma_1
- values_at_sigma_3
Finally, the output.yaml file should allow one to customize the samples_summary.json file:
https://github.com/rhayes777/PyAutoFit/blob/main/autofit/config/output.yaml
max_log_likelihood: true
median_pdf: true
values_at_sigma_1: true
values_at_sigma_3: false
errors_at_sigma_1: true
errors_at_sigma_3: false
This PR https://github.com/rhayes777/PyAutoFit/pull/981 has made some changes to SamplesSummary:
- Removed covariance_matrix.
- Removed model (and loads it via the
.jsonin certain circumstances). - Added
median_pdf.
Lets chat next week about all this...
The remaining task is to extend samples summary with the following quantitieS:
- values_at_sigma_1
- values_at_sigma_3