Reading values missing "E" character for exponentials
Hello,
I am using FISPACT and noticed that the extract_boundaries_and_values function in gammaspectrum.py for output treatment has issues treating outputs with missing "E" character for exponential format.
gammaspectrum.txt
I added a "patch" from my Python experience wich is not optimal of course.
I hope this is the correct way of contributing. If this is not welcomed, please notify me. If this is welcomed but clearly not optimal, I'll be happy to know a better method.
Best regards
Hi, thanks for finding this issue.
Looking at the code that could indeed be the case, although I have not seen a test case showing this yet. Could you perhaps share the file that is causing the issue, so I could add it as a test case? If the spectrum is sensitive then perhaps you can mutate the numbers but preserve the failing format.
From your attachment, I assume you're proposing the following fix.
import re
from pypact.output.tags import GAMMA_SPECTRUM_SUB_HEADER
from pypact.util.decorators import freeze_it
from pypact.util.jsonserializable import JSONSerializable
FLOAT_NUMBER = r"[0-9]+(?:\.(?:[0-9]+))?(?:e?(?:[-+]?[0-9]+)?)?"
GAMMA_SPECTRUM_LINE = \
r"[^(]*\(\s*(?P<lb>{FN})\s*-\s*(?P<ub>{FN})\s*MeV\)\s*(?P<value>{FN})\D*(?P<vr>{FN}).*".format(
FN=FLOAT_NUMBER,
)
GAMMA_SPECTRUM_LINE_MATCHER = re.compile(GAMMA_SPECTRUM_LINE, re.IGNORECASE)
@freeze_it
class GammaSpectrum(JSONSerializable):
"""
The gamma spectrum type from the output
"""
def __init__(self):
self.boundaries = [] # TODO dvp: should be numpy arrays (or even better xarrays)
self.values = []
self.volumetric_rates = []
def fispact_deserialize(self, file_record, interval):
self.__init__()
lines = file_record[interval]
def extract_boundaries_and_values(_lines):
header_found = False
for line in _lines:
if not header_found:
if GAMMA_SPECTRUM_SUB_HEADER in line:
header_found = True
if header_found:
if line.strip() == "":
return
match = GAMMA_SPECTRUM_LINE_MATCHER.match(line)
lower_boundary = float(match.group("lb"))
upper_boundary = float(match.group("ub"))
value_str = match.group("value")
if "E" not in value_str :
splitted_value_str = value_str.split("-")
splitted_value_str = [splitted_value_str[0], "E-", splitted_value_str[1]]
value_str = "".join(splitted_value_str)
value = float(value_str)
volumetric_rate_str = match.group("vr")
if "E" not in volumetric_rate_str :
splitted_volumetric_rate_str = volumetric_rate_str.split("-")
splitted_volumetric_rate_str = [splitted_volumetric_rate_str[0], "E-", splitted_volumetric_rate_str[1]]
volumetric_rate_str = "".join(splitted_volumetric_rate_str)
volumetric_rate = float(volumetric_rate_str)
yield lower_boundary, upper_boundary, value, volumetric_rate
boundaries = []
values = []
volumetric_rates = []
for lb, ub, v, vr in extract_boundaries_and_values(lines):
if not boundaries:
boundaries.append(lb)
boundaries.append(ub)
values.append(v)
volumetric_rates.append(vr)
if values:
self.boundaries = boundaries
self.values = values
self.volumetric_rates = volumetric_rates
This could work, but I am now thinking we should probably use the utility function to handle this: https://github.com/fispact/pypact/blob/master/pypact/util/numerical.py#L12
There are some tests already to try and cover this case - is your failing float an example of one of these tests? https://github.com/fispact/pypact/blob/master/tests/util/numericaltest.py
Hi,
Thanks for aswering.
I'd rather not send you my files because I don't know in what extent I am allowed to share anything, even with artificial data.
The number format with causes this issue is indeed in fortrant float style "-2.34321-308" (which I didn't know it existed until now).
Using the utility function is clearly a better option since mine would cause issues with negative values in fortran format. I successfully tested it in my case replacing the float() functions by get_float() from numerical.py.
Thanks
PS : Replacing the float() functions in gammaspectrum.py.
Going to reopen this to fix as suggested.