python-neo icon indicating copy to clipboard operation
python-neo copied to clipboard

Read. plx file, error occurs when the file is larger than 2GB

Open didi226 opened this issue 1 year ago • 2 comments

Describe the bug When reading. plx files, an error occurs when the file is larger than 2GB, and when the file is smaller than 1GB, it can be read and called normally. reader= neo.io.PlexonIO(filename = file_path) To Reproduce )C:\software\Anaconda\envs\py3.9\lib\site-packages\neo\rawio\plexonrawio.py:91: RuntimeWarning: overflow encountered in long_scalars pos += length C:\software\Anaconda\envs\py3.9\lib\site-packages\neo\rawio\plexonrawio.py:87: RuntimeWarning: overflow encountered in ushort_scalars length = bl_header['NumberOfWaveforms'] * bl_header['NumberOfWordsInWaveform'] * 2 + 16 Traceback (most recent call last): File "C:\Program Files\JetBrains\PyCharm 2023.3.1\plugins\python\helpers\pydev\pydevd.py", line 1527, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "C:\Program Files\JetBrains\PyCharm 2023.3.1\plugins\python\helpers\pydev_pydev_imps_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "D:\code_cloud\new_eeg_deep\scut_eeg_dl\sub_pipeline\src\sub_pipeline\pro_lfp\lfp_read.py", line 9, in seg = neo.io.PlexonIO(filename = file_path).read_block().segments[0].events File "C:\software\Anaconda\envs\py3.9\lib\site-packages\neo\io\plexonio.py", line 20, in init BaseFromRaw.init(self, filename) File "C:\software\Anaconda\envs\py3.9\lib\site-packages\neo\io\basefromrawio.py", line 74, in init self.parse_header() File "C:\software\Anaconda\envs\py3.9\lib\site-packages\neo\rawio\baserawio.py", line 178, in parse_header self._parse_header() File "C:\software\Anaconda\envs\py3.9\lib\site-packages\neo\rawio\plexonrawio.py", line 90, in _parse_header block_pos[bl_type][chan_id].append(pos) KeyError: 32769

If the error occurs when reading a file that you can't share publicly, please let us know, and we'll get in touch to discuss sharing it privately.

Expected behaviour Can read. plx files normally

Environment:

  • Windows
  • Python version 3.9
  • Neo version 0.120 ,0.10 and nonstable version
  • NumPy version

Additional context Add any other context about the problem here.

didi226 avatar Dec 20 '23 02:12 didi226

@didi226,

Just following up on this now. Which version of plexon .plx file is this? The io has only been tested for 100-106, so if this is a different version then we might need a test file to figure out the problem.

zm711 avatar Jan 29 '24 13:01 zm711

@didi226,

Just wanted to follow up again. I recently test our PlexonRawIO with a file that was 3gb and had no problem so I would love to figure out if this is a version issue or something else.

zm711 avatar Feb 26 '24 13:02 zm711

I am also running into this same issue. I am using neo version 0.13.1 with spike interface. The size of my file is 2.04 gigabytes. I have previously read a plexon file that was 1.47 gigabytes without issues.

Parsing data blocks:  98%|█████████▊| 2147320464/2197169064 [02:18<00:03, 16341604.24it/s]C:\Users\User\anaconda3\Lib\site-packages\neo\rawio\plexonrawio.py:143: RuntimeWarning: overflow encountered in scalar add
  pos += length
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[2], line 21
     19 #read
     20 recording = BasicSpikeSorting(recording)
---> 21 recording.basic_read_raw_file()
     22 recording.create_probe()
     23 recording.load_probe()

Cell In[1], line 266, in BasicSpikeSorting.basic_read_raw_file(self)
    264     if self.stream_id == "":
    265         self.stream_id = None
--> 266     self.recording = se.read_plexon(self.file_path, stream_id=self.stream_id, stream_name=self.stream_name)
    267 elif self.file_type == self.supported_raw_file_types[22]:
    268     #plexon2
    269     #By default, stream names are strings starting with the first stream being named "Signals 0", then "Signals 1"...
    270     self.stream_name = input("Please provide the name of the stream you are trying to access: ")

File ~\anaconda3\Lib\site-packages\spikeinterface\extractors\neoextractors\plexon.py:34, in PlexonRecordingExtractor.__init__(self, file_path, stream_id, stream_name, all_annotations)
     32 def __init__(self, file_path, stream_id=None, stream_name=None, all_annotations=False):
     33     neo_kwargs = self.map_to_neo_kwargs(file_path)
---> 34     NeoBaseRecordingExtractor.__init__(
     35         self, stream_id=stream_id, stream_name=stream_name, all_annotations=all_annotations, **neo_kwargs
     36     )
     37     self._kwargs.update({"file_path": str(Path(file_path).resolve())})

File ~\anaconda3\Lib\site-packages\spikeinterface\extractors\neoextractors\neobaseextractor.py:187, in NeoBaseRecordingExtractor.__init__(self, stream_id, stream_name, block_index, all_annotations, use_names_as_ids, **neo_kwargs)
    158 def __init__(
    159     self,
    160     stream_id: Optional[str] = None,
   (...)
    165     **neo_kwargs: Dict[str, Any],
    166 ) -> None:
    167     """
    168     Initialize a NeoBaseRecordingExtractor instance.
    169 
   (...)
    184 
    185     """
--> 187     _NeoBaseExtractor.__init__(self, block_index, **neo_kwargs)
    189     kwargs = dict(all_annotations=all_annotations)
    190     if block_index is not None:

File ~\anaconda3\Lib\site-packages\spikeinterface\extractors\neoextractors\neobaseextractor.py:27, in _NeoBaseExtractor.__init__(self, block_index, **neo_kwargs)
     23 def __init__(self, block_index, **neo_kwargs):
     24 
     25     # Avoids double initiation of the neo reader if it was already done in the __init__ of the child class
     26     if not hasattr(self, "neo_reader"):
---> 27         self.neo_reader = self.get_neo_io_reader(self.NeoRawIOClass, **neo_kwargs)
     29     if self.neo_reader.block_count() > 1 and block_index is None:
     30         raise Exception(
     31             "This dataset is multi-block. Spikeinterface can load one block at a time. "
     32             "Use 'block_index' to select the block to be loaded."
     33         )

File ~\anaconda3\Lib\site-packages\spikeinterface\extractors\neoextractors\neobaseextractor.py:66, in _NeoBaseExtractor.get_neo_io_reader(cls, raw_class, **neo_kwargs)
     64 neoIOclass = getattr(rawio_module, raw_class)
     65 neo_reader = neoIOclass(**neo_kwargs)
---> 66 neo_reader.parse_header()
     68 return neo_reader

File ~\anaconda3\Lib\site-packages\neo\rawio\baserawio.py:189, in BaseRawIO.parse_header(self)
    176 """
    177 Parses the header of the file(s) to allow for faster computations
    178 for all other functions
    179 
    180 """
    181 # this must create
    182 # self.header['nb_block']
    183 # self.header['nb_segment']
   (...)
    186 # self.header['spike_channels']
    187 # self.header['event_channels']
--> 189 self._parse_header()
    190 self._check_stream_signal_channel_characteristics()
    191 self.is_header_parsed = True

File ~\anaconda3\Lib\site-packages\neo\rawio\plexonrawio.py:142, in PlexonRawIO._parse_header(self)
    140 bl_type = int(bl_header["Type"])
    141 chan_id = int(bl_header["Channel"])
--> 142 block_pos[bl_type][chan_id].append(pos)
    143 pos += length
    145 # Update tqdm with the number of bytes processed in this iteration

KeyError: 14

AbhiSwamiUConn avatar Jun 24 '24 18:06 AbhiSwamiUConn

@AbhiSwamiUConn,

any chance you would be willing to share the data? either a link here or privately? As I wrote I tested this on a file that was 3gb and it worked fine.

alternatively could you try to open this in neo directly? and try to open it using this PR #1494?

Finally what OS, python version, spikeinterface, and numpy version are you using.

zm711 avatar Jun 24 '24 19:06 zm711

Also adding @h-mayorquin to this thread since he is working on updating the reader.

zm711 avatar Jun 24 '24 19:06 zm711

I do not know if I am allowed to share the data publicly. I am using windows 11, python 3.11.9, spikeinterface version 0.100.7, and numpy version 1.26.4. I opened the file using the current version of neo directly and did not get any errors but did not see any progress bars when by default, they should show up:

import neo.rawio as nr
nr.PlexonRawIO("My file path")
output: PlexonRawIO: My file path

Am I doing this right? Secondly, will the code in the pull request work given that I am on windows?

AbhiSwamiUConn avatar Jun 24 '24 20:06 AbhiSwamiUConn

That's fine if you can't. You can also email us if you get permission to share it privately as well. Let us know if you'd like that option.

You haven't parsed the file yet. SpikeInterface takes care of that and activating the progress bar. Neo is lower level and so it's default should be to not include the progress bar. To finish your test do

import neo.rawio

reader = neo.rawio.PlexonRawIO('my file path')
reader.parse_header()

The header parsing will tell us if this worked or not. Creating the reader does not actually parse the reader. Again spikeinterface takes care of that step for you.

The PR says apple silicon because originally there was an issue for Macs, but there are also some c-pointer issues that are being worked on as well as some stochastic file reading failures. So maybe @h-mayorquin would also change the name of the PR for better future indexing now that it has more features than just apple silicon support :)

If .parse_header() fails try the same but installing from that PR. :)

zm711 avatar Jun 24 '24 20:06 zm711

But this is a plexon 1 problem though, #1494 is a plexon 2.

h-mayorquin avatar Jun 24 '24 21:06 h-mayorquin

Thanks Heberto I wasn't paying attention to the plexon version. @AbhiSwamiUConn don't test the PR. Just test the neo.rawio.

zm711 avatar Jun 24 '24 21:06 zm711

I am still running into the same issue. This is the error I got.

Parsing data blocks:  98%|███████████████████████████████████████████████████████████████████████████████████████████████████████████▍  | 2146695568/2197169064 [02:22<00:02, 17483938.28it/s]C:\Users\user\AppData\Roaming\Python\Python311\site-packages\neo\rawio\plexonrawio.py:143: RuntimeWarning: overflow encountered in scalar add
  pos += length
Traceback (most recent call last):
  File "c:\Users\user\custom-code\test.py", line 3, in <module>
    reader.parse_header()
  File "C:\Users\user\AppData\Roaming\Python\Python311\site-packages\neo\rawio\baserawio.py", line 189, in parse_header
    self._parse_header()
  File "C:\Users\user\AppData\Roaming\Python\Python311\site-packages\neo\rawio\plexonrawio.py", line 142, in _parse_header
    block_pos[bl_type][chan_id].append(pos)
    ~~~~~~~~~^^^^^^^^^
KeyError: 14
Parsing data blocks:  98%|███████████████████████████████████████████████████████████████████████████████████████████████████████████▌  | 2147423568/2197169064 [02:22<00:03, 15028356.85it/s]

AbhiSwamiUConn avatar Jun 24 '24 22:06 AbhiSwamiUConn

Cool that tells us this is definitely in the Neo level and not in the spikeinterface wrapper. The only way we can work on this if with the data though. You want to check with your group and see if you can share with us privately?

Alternatively you could try to make a PR fixing this, but we do like having test data that fails so we can ensure fixes actually helped.

zm711 avatar Jun 24 '24 22:06 zm711

Yep, the critical issue for this is data.

h-mayorquin avatar Jun 24 '24 22:06 h-mayorquin

I can give you an answer by tomorrow. I emailed my professor a couple hours ago and have not heard back yet.

AbhiSwamiUConn avatar Jun 24 '24 22:06 AbhiSwamiUConn

My professor has said he is fine with sharing the files. How would you like me to send them? Both github and email have file size limits.

AbhiSwamiUConn avatar Jun 25 '24 13:06 AbhiSwamiUConn

However you feel comfortable, google drive, dropbox, just provide a link for us.

h-mayorquin avatar Jun 25 '24 14:06 h-mayorquin

It can also be an email, my personal email is my user name in gmail with a dot instead of an hyphen.

h-mayorquin avatar Jun 25 '24 14:06 h-mayorquin

I have sent a sharepoint link through email that contains the files. It should be titled: Plexon files for analysis

AbhiSwamiUConn avatar Jun 25 '24 15:06 AbhiSwamiUConn

Hey, I got your files, I can't not reproduce your error, is there a specific file that is generating it:

from pathlib import Path
from neo.rawio.plexonrawio import PlexonRawIO

file_path = Path.home() / "Downloads" / "plexon_larger_than_2GiB_files" / "vrm2905s12u3560f1.plx"
assert file_path.is_file(), f"{file_path} does not exist"

rawio = PlexonRawIO(filename=file_path)
rawio.parse_header()

Works just fine.

h-mayorquin avatar Jun 25 '24 18:06 h-mayorquin

For me, the files "vrm2905s13u3875f1", "vrt2905s12u3560f2", and "vrt2905s13u3875f2" do not work on my end. For me, "vrm2905s12u3560f1" was the only one that worked.

AbhiSwamiUConn avatar Jun 25 '24 18:06 AbhiSwamiUConn

I tried some of the ones you mention that fail and they work for me in linux. Are you using windows? I might try it from windows

h-mayorquin avatar Jun 25 '24 18:06 h-mayorquin

Yes, I am using windows 11, python 3.11.9, and numpy version 1.26.4.

AbhiSwamiUConn avatar Jun 25 '24 18:06 AbhiSwamiUConn

Ok.

h-mayorquin avatar Jun 25 '24 19:06 h-mayorquin

I reproduced your error on windows which probably means is probably an overflow error I think.

h-mayorquin avatar Jun 27 '24 02:06 h-mayorquin

Yep, it seems is just an overflow because of int differences. I tried #1497 on your files and it works for me. Can you test it as well @AbhiSwamiUConn

h-mayorquin avatar Jun 27 '24 02:06 h-mayorquin

It works. Thank you for your help.

AbhiSwamiUConn avatar Jun 27 '24 14:06 AbhiSwamiUConn