stepcount icon indicating copy to clipboard operation
stepcount copied to clipboard

IndexError when processing (old ?) GENEActiv .bin files

Open cWam-zz opened this issue 1 year ago • 9 comments

Hello, I work with Python 3.8.19 on Windows 10 - 64 bits.

An error appeared after running the following command line to process GENEActiv .bin files, only in some cases: stepcount "E:\file\directory\GENEActiv_file.bin" -o "E:\output\directory"

Here is the output message: java.lang.ArrayIndexOutOfBoundsException: Index 1 out of bounds for length 1 at GENEActivReader.parseBinFileHeader(GENEActivReader.java:221) at GENEActivReader.main(GENEActivReader.java:75) Reading file... Done! (0.16s) Error: C:\Users\***\AppData\Local\Temp\tmphxr_w9yo\data.npy - Le processus ne peut pas accéder au fichier car ce fichier est utilisé par un autre processus. Traceback (most recent call last): File "C:\Users\***\Anaconda3\envs\stepcount\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\***\Anaconda3\envs\stepcount\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\***\Anaconda3\envs\stepcount\Scripts\stepcount.exe\__main__.py", line 7, in <module> File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\stepcount\stepcount.py", line 58, in main data, info = read( File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\stepcount\stepcount.py", line 730, in read data, info = actipy.read_device( File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\actipy\reader.py", line 50, in read_device data, info = _read_device(input_file, verbose) File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\actipy\reader.py", line 220, in _read_device info['StartTime'] = t.iloc[0].strftime(strftime) File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\pandas\core\indexing.py", line 1103, in __getitem__ return self._getitem_axis(maybe_callable, axis=axis) File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\pandas\core\indexing.py", line 1656, in _getitem_axis self._validate_integer(key, axis) File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\pandas\core\indexing.py", line 1589, in _validate_integer raise IndexError("single positional indexer is out-of-bounds") IndexError: single positional indexer is out-of-bounds

The first error does not matter. It appeared everytime but the files can be processed. However, the IndexError stops the process. I noted that the error did not appear for recent files (collected in 2023) but it appeared for old files (collected in 2018), even if the devices used to record the data were the same from one year to another.

cWam-zz avatar Jun 02 '24 23:06 cWam-zz

@cWam-zz Hi. Any chance the file was empty? Do you know the size of the file? Is there a way you can share the file for me to debug?

chanshing avatar Jun 10 '24 10:06 chanshing

@chanshing Thank you for your message. No, files are not empty. File sizes are fom 250 Mo to 780 Mo. You can find an example of such a file on Zenodo. It is a 260 Mo file. Thank you.

cWam-zz avatar Jun 11 '24 05:06 cWam-zz

@cWam-zz Thanks! We will investigate and get back to you. In the meantime, a workaround for you could be to first convert your file to a CSV (using GENEActiv's own parser) then use our tool. Sorry for the inconvenience!

chanshing avatar Jun 11 '24 08:06 chanshing

@chanshing Thank you for your suggestion. I forgot to tell (I don't know if this could help)... I have already read these files using the GENEAread R package without any problems.

cWam-zz avatar Jun 11 '24 22:06 cWam-zz

@chanshing Thank you for your suggestion. I forgot to tell (I don't know if this could help)... I have already read these files using the GENEAread R package without any problems.

Thank you @cWam-zz , maybe you can try exporting your file to CSV using that tool.

chanshing avatar Jun 12 '24 04:06 chanshing

Hello, I'm sorry to come again in this issue. I converted a .bin file into a .csv file (1-second epoch) with the following colum names: time, x, y, z. But I'm still facing an issue (probably because of a time format?). I ran the following command: stepcount "E:\***\XXXXXX_left wrist_047218_2018-10-18 11-10-57.csv"

Here is the full output message:

Gravity calibration... Done! (0.08s) Nonwear detection... Done! (0.21s) Traceback (most recent call last): File "C:\Users\***\Anaconda3\envs\stepcount\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\***\Anaconda3\envs\stepcount\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\***\Anaconda3\envs\stepcount\Scripts\stepcount.exe\__main__.py", line 7, in <module> File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\stepcount\stepcount.py", line 58, in main data, info = read( File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\stepcount\stepcount.py", line 711, in read data, info = actipy.process( File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\actipy\reader.py", line 153, in process data, info_resample = P.resample(data, resample_hz) File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\actipy\processing.py", line 34, in resample pd.Timedelta(pd.infer_freq(data.index)).total_seconds(), File "pandas\_libs\tslibs\timedeltas.pyx", line 1766, in pandas._libs.tslibs.timedeltas.Timedelta.__new__ File "pandas\_libs\tslibs\timedeltas.pyx", line 649, in pandas._libs.tslibs.timedeltas.parse_timedelta_string ValueError: unit abbreviation w/o a number

I uploaded the .csv files I used into the same Zenodo repository. As explained, for the time column, I used 1) a string and 2) a "POSIXct" "POSIXt" R class before saving into a csv file. I may miss of forget something for the time?

cWam-zz avatar Nov 25 '24 00:11 cWam-zz

Hi @cWam-zz It seems that the file only has second-level summaries, which would make it impossible for our models to work (requiring at least 15Hz frequency, ideally more - yours is 1Hz).

chanshing avatar Nov 25 '24 12:11 chanshing

Hi @chanshing Thank you for your clarification. I have tested with 20Hz data and it seems to work good.

I however noted that it is necessary to have files with 2 or 3 digits precision for seconds at each row. The accepted format can be _ like this with 2 digits precision:

"time","x","y","z","id" 2018-07-05 09:24:51.00,0.249348574720771,-0.245585232050615,-1.00636620318577 2018-07-05 09:24:51.04,0.1982820704194,-0.206876760234386,-0.923792019604709 2018-07-05 09:24:51.09,0.156381348941352,-0.2442504571604,-1.08361302008417 2018-07-05 09:24:51.15,0.207447853242723,-0.202872435563742,-0.949097011347291 2018-07-05 09:24:51.20,0.267680140367417,-0.193529011332239,-0.902482552874114

Or like this with 3 digits precision: "time","x","y","z" "2018-07-05 09:24:51.000",0.249348574720771,-0.245585232050615,-1.00636620318577 "2018-07-05 09:24:51.049",0.1982820704194,-0.206876760234386,-0.923792019604709 "2018-07-05 09:24:51.099",0.156381348941352,-0.2442504571604,-1.08361302008417 "2018-07-05 09:24:51.150",0.207447853242723,-0.202872435563742,-0.949097011347291 "2018-07-05 09:24:51.200",0.267680140367417,-0.193529011332239,-0.902482552874114

But I got an error with file contents like: "time","x","y","z","id" 2018-07-05 09:24:51,0.249348574720771,-0.245585232050615,-1.00636620318577 2018-07-05 09:24:51.05,0.1982820704194,-0.206876760234386,-0.923792019604709 2018-07-05 09:24:51.1,0.156381348941352,-0.2442504571604,-1.08361302008417 2018-07-05 09:24:51.15,0.207447853242723,-0.202872435563742,-0.949097011347291 2018-07-05 09:24:51.2,0.267680140367417,-0.193529011332239,-0.902482552874114

Here is the encountered error with this type of file: Traceback (most recent call last): File "C:\Users\***\Anaconda3\envs\stepcount\lib\runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\***\Anaconda3\envs\stepcount\lib\runpy.py", line 87, in _run_code exec(code, run_globals) File "C:\Users\***\Anaconda3\envs\stepcount\Scripts\stepcount.exe\__main__.py", line 7, in <module> File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\stepcount\stepcount.py", line 58, in main data, info = read( File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\stepcount\stepcount.py", line 705, in read freq = infer_freq(data.index) File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\stepcount\stepcount.py", line 750, in infer_freq tdiff = t.to_series().diff() File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\pandas\core\series.py", line 2870, in diff result = algorithms.diff(self._values, periods) File "C:\Users\***\Anaconda3\envs\stepcount\lib\site-packages\pandas\core\algorithms.py", line 1454, in diff out_arr[res_indexer] = op(arr[res_indexer], arr[lag_indexer]) TypeError: unsupported operand type(s) for -: 'str' and 'str'

Thanks again for your help.

cWam-zz avatar Dec 05 '24 00:12 cWam-zz

Thanks @cWam-zz that's a good diagnosis. Yes, decimals will be needed for >1Hz data (to show the milliseconds). For the 3rd scenario, I think the problem is that the very first timestamp does not have decimals, so the parser infers that all remaining rows will have no decimals.

The following modification should probably work (note that I added .00 to the first timestamp):

"time","x","y","z","id"
2018-07-05 09:24:51.00,0.249348574720771,-0.245585232050615,-1.00636620318577
2018-07-05 09:24:51.05,0.1982820704194,-0.206876760234386,-0.923792019604709
2018-07-05 09:24:51.1,0.156381348941352,-0.2442504571604,-1.08361302008417
2018-07-05 09:24:51.15,0.207447853242723,-0.202872435563742,-0.949097011347291
2018-07-05 09:24:51.2,0.267680140367417,-0.193529011332239,-0.902482552874114

chanshing avatar Dec 05 '24 16:12 chanshing