fitsio icon indicating copy to clipboard operation
fitsio copied to clipboard

Make sure string column with vec size 1 work

Open esheldon opened this issue 6 months ago • 9 comments

closes #426

This fixes a bug where the shape of the column was not correct.

Adds tests that string and number vec1 column data get returned, even though read as scalar

esheldon avatar May 14 '25 14:05 esheldon

It is a breaking change, in that those columns will now be properly round tripped whereas they were not before.

On the other hand, I consider this a bug in numpy that I couldn't get this to work in 1.X. Seen that way it is breaking in the sense of correcting behavior.

esheldon avatar May 14 '25 19:05 esheldon

Is there a way we can make it work for non-strings?

beckermr avatar May 14 '25 19:05 beckermr

There is no way to set the TDIM for length-1 vector columns, they must be length 2 or more.

I can see hacks working. We could add a header keyword that tells us the situation for a given column so fitsio can reshape the data.

esheldon avatar May 14 '25 19:05 esheldon

My feeling on this specific item is that this specific behavior of FITSIO is so ingrained in people's codes + minds that if we break it, we're going to cause silent or at least very weird bugs. I get that it can be considered a bug, but long-standing bugs that people code around effectively become part of the API.

beckermr avatar May 14 '25 19:05 beckermr

I can see hacks working. We could add a header keyword that tells us the situation for a given column so fitsio can reshape the data.

Right. My feeling is that we

  1. Keep the current behavior before this PR where columns of shape (1,) are cast to scalars for all types.

  2. If we do want to fix this, we fix it for all types AND we bump fitsio to major version 2.

This seems to me to be the only way to do this without causing lots of bad side effects.

beckermr avatar May 14 '25 19:05 beckermr

We could include some of the other big ideas we've discussed in version 2 as well, including the new I/O modes+features.

beckermr avatar May 14 '25 19:05 beckermr

I don't actually know how to workaround the new numpy behavior, will have to figure it out.

esheldon avatar May 14 '25 19:05 esheldon

Can't we just change the output dtype for strings if we find TDIM has 2 elements?

beckermr avatar May 14 '25 19:05 beckermr

There is code reuse for read and write. It seems like different behavior might be needed for read vs write in this case, but need to look into it.

esheldon avatar May 14 '25 19:05 esheldon