astropy icon indicating copy to clipboard operation
astropy copied to clipboard

Float value formatting in FITS header

Open saimn opened this issue 7 years ago • 7 comments

I have a FITS header with a float value that gets changed from '8.95' to '8.949999999999999' when saving to a new file (and as a result the comment is truncated):

> /home/conseil/lib/astropy/astropy/io/fits/card.py(976)_format_image()
-> warnings.warn('Card is too long, comment will be truncated.',

(Pdb) self
('ESO TEL AMBI WINDSP', 8.95, '[m/s] Observatory ambient wind speed queri')
(Pdb) output
'HIERARCH ESO TEL AMBI WINDSP = 8.949999999999999 / [m/s] Observatory ambient wind speed queri'
(Pdb) output[:Card.length]
'HIERARCH ESO TEL AMBI WINDSP = 8.949999999999999 / [m/s] Observatory ambient win'
(Pdb) len(output)
93
(Pdb) Card.length
80
(Pdb) value
'8.949999999999999'
(Pdb) self.value
8.95
(Pdb) str(self.value)
'8.95'
(Pdb) self._format_value()
'8.949999999999999'

(Pdb) from astropy.io.fits.card import _format_float
(Pdb) _format_float(self.value)
'8.949999999999999'

This is not highly critical, but it would be better to get the output header same as the input one. astropy.io.fits.card._format_float uses %.16G to convert the string, I don't know what is the best option here (is str(value) a valid option ?). ping @embray

saimn avatar Nov 02 '16 13:11 saimn

Does the FITS format specify the format to be used for floating point values or decimals?

Additionally, would there be any downside to simply using %s instead of %.16G? I'm not too familiar with the project at the moment, but it seems that implementing a fix using %s would be easier than str(value).

celestialorb avatar Nov 18 '16 05:11 celestialorb

Related : https://github.com/spacetelescope/PyFITS/issues/93

A user wrote in with a problem where the representation with which PyFITS was writing a floating point value was causing problems in a program that was reading that FITS file. There's not really a bug in PyFITS here: For one, I think it was a bug in Python that was causing the floating point value's representation to not be rounded to the number of digits he wanted. It was also a problem in the program he was using that it was overly sensitive to the small rounding error. However, his problems would have been alleviated if he could have exactly controlled how PyFITS formatted the value in the header. That is, when he passed in {0:.5e}".format(num) the value was formatted correctly but enclosed in quotes (since PyFITS saw a string and not a float). This could be done by creating a new card image manually and passing it to Card.fromstring, but that's kludgy. A nicer way would be to allow overriding the default format string for a type. For example, when PyFITS gets a float, it formats it with the format string '%.16G'. It could be possible to have an optional format= argument to header.update() (now header.set() in the header rewrite branch) that overrides the default format string. https://aeon.stsci.edu/ssb/trac/pyfits/ticket/78

saimn avatar Dec 21 '17 11:12 saimn

I was going suggest to simply change .16g to 16g. That way, we get up to 16 digits (with trailing zeros removed). For the most common type of number in Python (64 bit floats) that works pretty well.

print('%.16g' % -99.9)
print('%16g' % -99.9)
-99.90000000000001
           -99.9

.16g works well for number of order 1, but for larger numbers there is not enough precision left in float for thepost-period digits.

However, that fails with small numbers.

print('%.16g' % (1e-10 - 1e-18))
print('%16g' % (1e-10 - 1e-18))
9.999999900000001e-11
           1e-10

MMM. Grr.

So, after some experimentation, I think 16 digits is just too much. It happens works for many floats, but not for all. We should just reduce by one digit, then it will work for all 64 bit floats, e.g. for the ones that failed before:

for val in [-99.9, 1234.123, 1/3, 1e-10 - 1e-18]:

    print('%.15g' % val)

-99.9
1234.123
0.333333333333333
9.9999999e-11

hamogu avatar Mar 15 '22 00:03 hamogu

@vstinner pointed me to https://github.com/python/cpython/blob/main/Python/pystrtod.c#L1242 which is the code used by Python to format a float (thanks!). This code tries to find the smallest number giving an exact representation (if I understood correctly), so I think we should just rely on Python instead of using a default format (as you pointed @hamogu .15g seems better tough but it may lead to a small precision loss in some cases, and Python seems to increase the precision if needed up to .17g I think).

saimn avatar Mar 18 '22 21:03 saimn

Our code calculates some floats that are stored in several places, e.g. in database columns and in FITS files (the latter through astropy).

What would be the easiest way to ensure that both these values are identical?

The only thing I can think of is convert every value with Card.fromstring(str(Card("k", value))).value, which seems rather cumbersome.

The database stores floats exactly I believe (lets assume that), so I think it will be necessary to convert all floats to whatever astropy would store in a FITS file. This conversion needs to be done right after calculation, because otherwise the value in memory will not be the same as the value on disk / in the database.

E.g.: if the desired value is: 1234567891234.123456, then this gets truncated to 1234567891234.1235 by Python already (float64 precision), then this gets truncated again to 1234567891234.124 by astropy, which Python parses as a distinct number from the original.

So, like this:

>>> value = 1234567891234.123456
>>> value
1234567891234.1235
>>> Card.fromstring(str(Card("k", value))).value
1234567891234.124

And then do that last line for every value that might end up in a FITS header. There must be a better way.

I haven't been able to determine why the 16 digit accuracy is used. It seems it was this commit: https://github.com/spacetelescope/PyFITS/commit/b0e2e8af8da466f13daaf46416edc1de2d5f3382

Also, it is not clear why the maximum number of characters is 20; introduced in https://github.com/spacetelescope/PyFITS/commit/ee24dd179b1e25b24f4416b68c80ac567c228b19

It seems that my problem would be solved if we went for 17 digits. There is only 16 decimals of information in a float64, but apparently it is not evenly distributed so occasionally there is a need for 17 digits.

hugobuddel avatar Jul 03 '22 20:07 hugobuddel

Numpy also moved to using something similar to python, which uses the smallest possible exact representation of a float. I'd definitely suggest to just remove the attempt to control the format and use python directly.

mhvk avatar Jul 05 '22 08:07 mhvk

While I'd welcome this change, and agree that it should be made, there are two points that should be considered:

  • Chesterton's Fence: There are two limits: 17 digits and 20 characters total. We don't know why these limits are there (at least I don't). Maybe they are there to ensure pyfits (and thus astropy) is highly compatible with some very popular piece of astronomical software (e.g. maybe using more digits would crash that software). Or maybe the limits are there for backwards compatibility with something. It would perhaps be better to know why these limits exist before changing them.
  • This would be a breaking change: FITS files written by the updated version of astropy will have different headers than software with the current (and all the past) version of astropy. Therefore it would probably be best to make this change for a major release.

It would perhaps be useful to make the 'default-to-whatever-Python-does' behavior optional though. Then breaking compatibility with the past or with other software would be at the discretion of the user.

hugobuddel avatar Jul 19 '22 11:07 hugobuddel