fitsio icon indicating copy to clipboard operation
fitsio copied to clipboard

Fitsio works (only) with unaligned data structures

Open olebole opened this issue 8 years ago • 14 comments

On systems like Sparc and (partly) ARM it is required that data pointers are aligned, and also on other architectures unaligned data may lead to a lower performance. Numpy however seems to create non-aligned data structures which cause the unit tests to fail (segfault) under Sparc64 and armhf platforms. Aligning the data in the tests does not work yet, since this creates an unknown column type "V" for table data.

olebole avatar Apr 23 '16 13:04 olebole

I think this might be straightforward, but tedious to implement. It basically comes down to ignoring some of the fields ("V") and repacking the data before creating the table and writing.

esheldon avatar Jul 14 '16 15:07 esheldon

Note writing aligned data already works fine. For writing, references to the individual fields in the structured array are packed into a list, and in the C code I use the following to get a pointer to the data for a given row

data = PyArray_GETPTR1(array, row);

this is fully aware of the alignment, strides, etc.

To make the reading also work I could restructure the C code to work in the same way. I think this would have minimal performance impact, but would be more general, and would support reading into aligned data as well.

esheldon avatar Aug 11 '16 12:08 esheldon

Also note, that arr.dtype.names reports the names of interest, but the descriptor has the extra fields we would ignore

>>> dtype=numpy.dtype([('ra','f8'),
                       ('index','i4'),
                       ('dec','f8'),
                       ('s','S7')],align=True)
>>> xa=numpy.zeros(3, dtype=dtype)


# note this looks different from unaligned, it shows
# the dtype as a dict
>>> xa
array([(0.0, 0, 0.0, ''), (0.0, 0, 0.0, ''), (0.0, 0, 0.0, '')], 
      dtype={'names':['ra','index','dec','s'], 'formats':['<f8','<i4','<f8','S7'], 'offsets':[0,8,16,24], 'itemsize':32, 'aligned':True})

# this is what we want
>>> xa.dtype.names
('ra', 'index', 'dec', 's')

# this contains stuff for padding, which we don't want
>>> xa.dtype.descr
[('ra', '<f8'),
 ('index', '<i4'),
 ('', '|V4'),
 ('dec', '<f8'),
 ('s', '|S7'),
 ('', '|V1')]

esheldon avatar Aug 11 '16 12:08 esheldon

Just wanted to leave a sort of "me too" - on Ubuntu we now can't build python-fitsio for armhf, because our builders have kernels which raise SIGBUS on unaligned access and now the testsuite fails because of this.

iainlane avatar Nov 28 '16 20:11 iainlane

Oho, I forgot to attach the bt (git master, numpy 1.11.1 rc1, cfitsio 3.390) in case this is a different issue to the one you are discussing.

root@ethical-kitten:~/fitsio/build/lib.linux-armv7l-2.7# gdb --args python2.7 -m unittest discover -v
[…]
(gdb) run
Starting program: /usr/bin/python2.7 -m unittest discover -v
Cannot parse expression `.L954 4@r4'.
warning: Probes-based dynamic linker interface failed.
Reverting to original interface.

[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/arm-linux-gnueabihf/libthread_db.so.1".
testAsciiTableWriteRead (fitsio.test.TestReadWrite) ... ok
testBz2Read (fitsio.test.TestReadWrite) ... 
Program received signal SIGBUS, Bus error.
0xf6855f64 in ffi8fi8 (input=0x5f9307, ntodo=1, scale=1, zero=0, output=0xfffe5c10, status=0xfffecdf4) at putcolj.c:1872
1872	putcolj.c: No such file or directory.
(gdb) bt
#0  0xf6855f64 in ffi8fi8 (input=0x5f9307, ntodo=1, scale=1, zero=0, output=0xfffe5c10, status=0xfffecdf4) at putcolj.c:1872
#1  0xf6854f1c in ffpcljj (fptr=0x372320, colnum=8, firstrow=1, firstelem=1, nelem=1, array=0x5f9307, status=0xfffecdf4) at putcolj.c:1428
#2  0xf6845c24 in ffpcl (fptr=0x372320, datatype=81, colnum=8, firstrow=firstrow@entry=1, firstelem=firstelem@entry=1, nelem=1, array=0x5f9307, status=status@entry=0xfffecdf4) at putcol.c:701
#3  0xf69568a4 in PyFITSObject_write_columns (self=<optimized out>, args=<optimized out>, kwds=<optimized out>) at fitsio/fitsio_pywrap.c:1956
#4  0x00068d98 in PyEval_EvalFrameEx ()
#5  0x00062c50 in PyEval_EvalCodeEx ()
#6  0x00068af8 in PyEval_EvalFrameEx ()
#7  0x00062c50 in PyEval_EvalCodeEx ()
#8  0x0006864c in PyEval_EvalFrameEx ()
#9  0x00062c50 in PyEval_EvalCodeEx ()
#10 0x00068af8 in PyEval_EvalFrameEx ()
#11 0x00062c50 in PyEval_EvalCodeEx ()
#12 0x000797dc in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb) 

iainlane avatar Nov 28 '16 20:11 iainlane

@olebole I need to ask you some questions about this.

I'm confused how numpy record arrays work at all on these systems if align=False.

On one of these systems, is it possible that the data will be created aligned by default, or does aligned always need to be specified?

If it must be specified, do you know of a generic way to determine if aligned data is needed? If so, then this can be set internally in fitsio.

I'm working on this in a branch called aligned

esheldon avatar Mar 10 '17 16:03 esheldon

In the case of aligned data, one cannot use the faster reading functions.

This results in a significant slowdown, about a factor of two in the case where one is reading all rows and all columns.

So I do think we want to special case this.

esheldon avatar Mar 10 '17 17:03 esheldon

If you look at the aligned branch, I've added some features. Please see below.

If you construct the FITS object with align=True (or use fitsio.read(..., align=True)) then it tries to read into objects with alignment. This is not the default because it slows down reading by as much as a factor of two.

Also in the test you an turn on testing with alignment

fitsio.test.align=True
fitsio.test.test()

(If you know a better way to do that let me know)

I hope this will now pass your tests.

If we can determine that this alignment should be turned on by default on certain systems, we should add that feature, so you don't have to do align=True everywhere.

esheldon avatar Mar 10 '17 20:03 esheldon

actually I figured out a simple way to make both tests with and without alignment run, you don't have to set that yourself now.

esheldon avatar Mar 10 '17 20:03 esheldon

Sorry that I didn't reply (and test) for a long time.

I now uploaded a version containing your changes from the aligned branch (combined here), but they didn't work (fail on ARM/32 bit).

olebole avatar Jul 05 '18 11:07 olebole

Where do we stand on this one?

esheldon avatar Nov 12 '20 14:11 esheldon

With the latest release 1.1.3, I still get bus errors in the test.

olebole avatar Nov 12 '20 15:11 olebole

Do you have any ideas for moving forward?

esheldon avatar Nov 12 '20 15:11 esheldon

I am afraid that I have no idea. I could try to generate another stacktrace, if that helps.

olebole avatar Nov 12 '20 16:11 olebole