UnifyFS
UnifyFS copied to clipboard
test_write_static dies with floating point exception
I get a floating point exception if I do the following:
$ salloc -N1
$ ./server.sh
$ srun -N1 -n1 ./client/tests/test_write_static -s1 -b1 -t1 -f /tmp/foo
srun: error: catalyst1: task 0: Floating point exception (core dumped)
Where server.sh contains:
#!/bin/bash
export UNIFYCR_META_SERVER_RATIO=1
export UNIFYCR_META_DB_NAME=unifycr_db
export UNIFYCR_CHUNK_MEM=0
basedir=$(dirname "$0")
srun -N 1 -n 1 $basedir/server/src/unifycrd &
GDB backtrace:
Core was generated by `./test_write_static -f /tmp/foobar -b 1 -t 1 -s 1'.
Program terminated with signal 8, Arithmetic exception.
#0 0x000000000040d8e0 in unifycr_split_index (cur_idx=<optimized out>, index_set=<optimized out>, slice_range=<optimized out>) at unifycr-fixed.c:292
292 long cur_slice_start = cur_idx->file_pos / slice_range * slice_range;
Missing separate debuginfos, use: debuginfo-install infinipath-psm-3.3-25_g326b95a_open.1.el7.x86_64 libibverbs-13-7.el7.x86_64 libnl3-3.2.28-4.el7.x86_64 libuuid-2.23.2-43.el7.x86_64 numactl-libs-2.0.9-6.el7_2.x86_64 openssl-libs-1.0.2k-8.el7.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0 0x000000000040d8e0 in unifycr_split_index (cur_idx=<optimized out>, index_set=<optimized out>, slice_range=<optimized out>) at unifycr-fixed.c:292
#1 unifycr_logio_chunk_write (fid=0, pos=390226556, meta=0x0, chunk_id=0, chunk_offset=4241984, buf=0x3f7ff, count=1) at unifycr-fixed.c:392
#2 0x000000000040d559 in unifycr_fid_store_fixed_write (fid=0, meta=0x1742627c, pos=0, buf=0x0, count=4241984) at unifycr-fixed.c:685
#3 0x000000000040711c in unifycr_fd_write (fd=-975858628, pos=<optimized out>, buf=<optimized out>, count=<optimized out>) at unifycr-sysio.c:506
#4 __wrap_pwrite (fd=1, buf=0x1742627c, count=0, offset=0) at unifycr-sysio.c:1446
#5 0x0000000000405126 in __wrap_write (fd=1025, buf=0x1742627c, count=0) at unifycr-sysio.c:833
#6 0x000000000040332b in main (argc=9, argv=0x7fffffffb1a8) at test_write.c:132
It seems this happens when I don't set UNIFYCR_USE_SPILLOVER=1. However I can avoid it with:
touch $EXTERNAL_DATA_DIR/spill_0_0.log
I think it may still be possible to hit this bug even with the USE_SPILLOVER fixes in place. It happens when unifycr_split_index() is called with slice_range=0, so we should ensure that can't happen or add error handling if it does.
I recently encountered this issue in the context of developing the RPC replacement for the mount functionality. The mount command sets the global variable: unifycr_key_slice_range using the value passed from the server as the response to the mount request. It is this value that is passed to the unifycr_split_index function as the slice_range parameter. Hence, it should be the responsibility of the mount function (unifycrfs_mount) to insure that this global variable is set to a valid, non-zero value.
UnifyFS has had several changes to the effected code since this issue. Closing this. Please open a new issue if a related error is encountered.