python-neo icon indicating copy to clipboard operation
python-neo copied to clipboard

WaveSurfer File Loading Classes

Open easy-electrophysiology opened this issue 3 years ago • 17 comments

Pull request to add support for WaveSurfer (https://wavesurfer.janelia.org) filetypes. Build around and requires the PyWaveSurfer module written by @boazmohar and @adamltaylor.

This pull request is not finished but have opened to discuss. Next, if this all sounds okay, I write on the neural ensemble list to request upload on the g-node of WaveSurfer filetypes (kindly provided by Boaz and Adam) and write tests for the module.

There were a couple of outstanding questions I had, to ensure the proposed module confirms to the Neo API (thanks BTW for structuring the rawIO API so cleanly and easy to use). These are in the wavesurferrawio.py header but also pasted here:

  • Wavesurfer also has analog output, and digital input / output channels, but here only supported analog input. Is this okay?
  • I believe but am not certain the signal streams field is configured correctly, used AxonRawIO as a guide.
  • each segment (sweep) has it's own timestamp, so I beleive no events_signals is correct (similar to winwcprawio not axonrawio)
  • For now I have just provided the minimal annotations. Would you recommend providing full annotations for this similar to axonrawio?

Best, Joe

easy-electrophysiology avatar Oct 17 '21 11:10 easy-electrophysiology

Hello @easy-electrophysiology! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

Line 13:100: E501 line too long (163 > 99 characters) Line 118:100: E501 line too long (115 > 99 characters)

Line 10:100: E501 line too long (125 > 99 characters) Line 12:100: E501 line too long (127 > 99 characters) Line 39:1: E302 expected 2 blank lines, found 1 Line 65:100: E501 line too long (100 > 99 characters) Line 66:100: E501 line too long (125 > 99 characters) Line 82:100: E501 line too long (115 > 99 characters) Line 113:18: E126 continuation line over-indented for hanging indent

Comment last updated at 2021-11-17 22:30:39 UTC

pep8speaks avatar Oct 17 '21 11:10 pep8speaks

Hi @easy-electrophysiology! Thanks for adding support for the wavesurfer format. The code looks like a good start to me. Regarding your questions:

Wavesurfer also has analog output, and digital input / output channels, but here only supported analog input. Is this okay?

If analog channels are the most important signal type to be used in Neo then we can add a first wavesurfer IO version supporting only those signals.

I believe but am not certain the signal streams field is configured correctly, used AxonRawIO as a guide.

I think you are right in only specifying a single signal stream, as you only have one source file and homogeneous data.

each segment (sweep) has it's own timestamp, so I believe no events_signals is correct (similar to winwcprawio not axonrawio)

Events are for representing special time points during the recording session. If you don't have any type of that data, then you don't need to specify it.

For now I have just provided the minimal annotations. Would you recommend providing full annotations for this similar to axonrawio?

If your data files contains additional information for each signal I would recommend also making these available via Neo using the annotation and array_annotations mechanism.

Regarding the test dataset we can either give you access to the gin repository directly or you just send the test dataset including the README file containing the attribution to me and I will upload it.

When using pywavesurfer in the back, does this load data in a memory mapping fashion or is the complete dataset copied into memory when calling ws.loadDataFile(self.filename, format_string="double")?

JuliaSprenger avatar Oct 18 '21 13:10 JuliaSprenger

Hi @JuliaSprenger thanks a lot for this, apologies for the delay in response.

  1. Thanks I will look into the annotations and discuss with the PyWaveSurfer devs to determine if there is anything important to put here.

  2. That would be great to get access to the gin repository, shall I request on the neural ensemble list?

  3. ws.loadDataFile() uses h5py to open the file with h5py.File(). I have had a look backend on h5py but have not been able to determine how exactly the memory is managed when opened through File(), however it looks well optimised. I see Neo uses h5py in some situations, are there any h5py options that are preffered?

easy-electrophysiology avatar Oct 27 '21 00:10 easy-electrophysiology

Hi again, for adding you as a collaborator to the gin repository I need an email address to invite you. Which one can I use for this?

@samuelgarcia Do you have any strong opinions about how to open h5py files?

JuliaSprenger avatar Oct 27 '21 09:10 JuliaSprenger

Hi thank you for this.

  1. about digital input, it is not suppported directly in rawio. This is something I plan in folowing month. What you can do is detect front and transform this to event but it is not lazy anymore.
  2. about h5py, I don't anderstand the question. normally h5py is totally lazy and load buffer on demand so it should be OK

I will check the code and signalstream soon.

samuelgarcia avatar Oct 28 '21 15:10 samuelgarcia

I have added a new commit, which wraps the PyWaveSurfer module in the io rather than rawio class, using stimfitio as a basis. It is working, though one issue with this is that it would be useful (for our use at least) to have the header, which in this commited version is also made in the io module. However, the header dtypes (_signal_channel_dtype) are imported from rawio.baserawio, so I can imagine this does not conform well to expected use of your rawio/ io classess.

Would it be better to generate the header in the rawio class and then load the data in read_block() of the io class?

Many thanks, Joe

easy-electrophysiology avatar Nov 17 '21 22:11 easy-electrophysiology

Hi Joe. It your file format can be written at rawio level it is better because the io is then wrapped with 5 lines of code and then the io will support lazy mode.

Why did you switch to io level ?

samuelgarcia avatar Nov 18 '21 08:11 samuelgarcia

Thanks for this, I thought as much though the problem is in its current form the pywavesurfer does not support lazy loading and so the entire file must be loaded into memory at once, which I think (?) violates the rawio API.

It would be possible to port the PyWaveSufer module to the rawio class but one benefit of leveraging their module is that it is maintained by the WaveSurfer team and so will be kept up to date by them. More practically, I am not sure I will have the time to impliment this solution for a month or so, but it would be possible.

Alternatively we could liase with the WaveSurfer team and ask whether they are happy to support arguments for lazy loading. Happy to proceed as you see fit.

easy-electrophysiology avatar Nov 18 '21 08:11 easy-electrophysiology

Hi, sorry for long response delay. The best alternative would be to ask pywasvesurfer team to add some lazy reading in their API. No ? Could you contact then ?

samuelgarcia avatar Jan 24 '22 10:01 samuelgarcia

@easy-electrophysiology Any news on this? Tell us if you need more feedback or support with a request towards pywavesurfer.

JuliaSprenger avatar Mar 31 '22 14:03 JuliaSprenger

@easy-electrophysiology : any news on lazy implementation possibiliy ?

samuelgarcia avatar Jun 23 '22 14:06 samuelgarcia

Hi both apologies for the delayed reply. Happy to contact pywavesurfer, though just reveiwing their module and would be great to get your advice on best way to implement lazy loading. Pywavesurfer loads the file into memory at once with h5py. I believe h5py does support lazy loading with memory loaded when data is sliced.

However to convert the pywavesurfer to support lazy loading, as I understand it the module would have to undergo quite a major re-write such that all data scaling etc. is run on each sweep only when loaded into memory (?). Could you see any easier way based on the existing code?

easy-electrophysiology avatar Jun 25 '22 22:06 easy-electrophysiology

Hi, guys. I took a look at this. I looked through the code base of pywavesurfer and I don't think is too hard to modify it for lazy reading.

However, I think that the true challenge is the scaling. Check the following:

If you choose to write your own code for reading data, whether in Matlab or in some other language, please be aware that you must take into account the information in the /header/Acquisition/AnalogScalingCoefficients dataset in the HDF5 file. In the data file, the analog data is stored as signed 16-bit integers ("counts"), exactly as read from the ADC during acquisition. To convert these to analog voltages, it is not sufficient to simply scale these values linearly. Instead, the counts must be transformed according to a cubic polynomial, the coefficients of which are stored in /header/Acquisition/AnalogScalingCoefficients. Simply scaling linearly will result in values that are systematically off by 5–10%. Even if you don't use ws.loadDataFile(), you should ensure that any custom code gives answers identical to the output of ws.loadDataFile() on representative data file

(bold mine) The coefficients are available on the header but you need to apply a polynomial operation to recover the signal. My understanding is that neo (and spikeinterface, @samuelgarcia) only handle linear gains and offsets. If this is correct it seems that this would not be useful. Have you handled something like this before?

If this can be done (save non-linear scaling) maybe we can have a discussion and I can take care of this @samuelgarcia @JuliaSprenger .

h-mayorquin avatar Oct 18 '22 19:10 h-mayorquin

Maybe this non-linear scaling can be taken care of in _get_analogsignal_chunk and not by the standard rawio mechanics. The question is if we can apply a mapping in a lazy way, such that not even the requested chunk is duplicated when applying the scaling. @samuelgarcia Any ideas for this?

JuliaSprenger avatar Oct 19 '22 07:10 JuliaSprenger

For reference, this is the operation that they do to transfrom int16 to float:

  for i in range(0, n_channels):
      scaled_data[i, :] = inverse_channel_scales[i] * np.polyval(np.flipud(scaling_coefficients[i, :]),
                                                                 data_as_ADC_counts[i, :])

data_as_ADC_counts is the int16 data that would be lazy. All the other quantities are in the header and are easy to extract.

https://github.com/JaneliaSciComp/PyWaveSurfer/blob/dc15d74cf70bb3db4501eb6d65049174fffa3398/pywavesurfer/ws.py#L198-L200

h-mayorquin avatar Oct 19 '22 09:10 h-mayorquin

Hi, guys. I took a look at this. I looked through the code base of pywavesurfer and I don't think is too hard to modify it for lazy reading.

Hi @h-mayorquin! Would you be up for looking into the changes required for lazy loading?

JuliaSprenger avatar Apr 03 '23 12:04 JuliaSprenger

Hi, Julia, I realize that I never replied you. I don't have time right now. But if a conversion comes up that requires this I will be able to do it.

h-mayorquin avatar May 11 '23 06:05 h-mayorquin