python-neo icon indicating copy to clipboard operation
python-neo copied to clipboard

NeuralynxIO: partial support for NVT files

Open alyxmi opened this issue 1 year ago • 37 comments

Type of Change

  • New feature
  • Bug fix

Description

Added partial support for Neuralynx NVT files (#1463). for NeuralynxIO and NeuralynxRawIO, the x and y pixel coordinates and animal head angle from the nvt file are treated as dimensionless analog signals bundled into a signal stream separate from the NCS stream. NlxHeader can now read NVT files as well. For detailed output please refer to the Testing section below.

A buggy regex in NlxHeader was also fixed. Previously it can only read a single value after the header keys, now it can read multiple.

Limitations

  • Only supports loading a single NVT file (or a directory with a single NVT file in it) at a time.
  • Only loads the dnextracted_x, dnextracted_y and dnextracted_angle data fields from NVT files. Other fields that could be potentially useful (dwPoints and dntargets) are not yet supported.
  • The NVT is assumed to be in the same segment (sharing a common clock (time basis)) as the NCS files in the same directory.

As a temporary solution to the potential problems these limitations might cause, I added an optional ignore_nvt argument to the NeuralynxIO and NeuralynxRawIO classes. When enabled, NeuralynxIO and NeuralynxRawIO will behave like how they were before my changes. NlxHeader works fine without any problems during testing.

Testing

Code was tested in accordance with the official contribution guideline. Additionally, it was tested on my own lab's data. Listed below are some examples.

reader = NeuralynxIO(dir)
print(reader)
nb_block: 1
nb_segment:  [1]
signal_streams: [Stream (rate,#packet,t0): (1600.0, 7537, 16166107179) (chans: 13), Stream (rate,#frame,t0): (29.97, 72286, 16165954061) (chans: 3)]
signal_channels: [CSC2_2, CSC3_2, CSC4_2, CSC5_2 ... CSC1_2 , VT1 , VT1 , VT1]
spike_channels: [chTT1_1#0#0, chTT1_1#1#0, chTT1_1#2#0, chTT1_1#3#0 ... chTT4_1#15#8 , chTT4_1#15#9 , chTT4_1#15#10 , chTT4_1#15#13]
event_channels: [Events event_id=4 ttl=0]

Each of the VT1 signal channel represents a data field (x, y pixel position and head angle).

print(reader.read_block().segments[0].analogsignals[1])
[[428. 349.  45.]
 [431. 345.  28.]
 [428. 349.  46.]
 ...
 [436. 334.  87.]
 [435. 334.  88.]
 [436. 334.  87.]] dimensionless

First column is x posiiton, second y position and thrid head angle in degrees.

print(reader.read_block().segments[0].analogsignals[0].array_annotations["Resolution"][0])
[720 480]

Video resolution from the NVT file header is passed into array_annotations for easy access. However because list type in array_annotations is buggy, it is passed as a literal string ('[720 480]') and not a list. Though once it is accessed it can be easily converted into a list using ast.literal_eval() in the ast package.

Additional Information

I should be able to make semi-regular contributions to this project. I don't have a good solution in mind to include the dwPoints and dntargets data fields as of now, but I should be able to fix the problem with loading multiple NVT files and implement proper segment assignment in the near future.

I opened this pull request dispite the above mentioned limitations because I think in most cases (at least in my lab and other labs I know of) there are only one NVT file in a given directory, and the files often share the same time basis (since they are likely from the same experiment session). So this current version should be helpful enough in most scenarios. With the ignore_nvt argument, users will also be able to revert back the code's behavior. So even if the new code causes problems, they could be easily avoided.

alyxmi avatar May 26 '24 04:05 alyxmi

Thanks @LEGion-42, we will review soon. :)

EDIT: See also that tests are currently failing.

zm711 avatar May 27 '24 07:05 zm711

pytest somehow skipped all the necessary neuralynxrawio tests for me so I didn't catch these before submitting the PR. Sorry for the oversight. :(

The 4 AssertionError's are my code working as intended and does not cause problems -- the test simply did not consider the fact that I'm passing behavior data as signals. The header size of 4 is just the original 1 from an ncs file + 3 new ones from the nvt file. I have now updated the test with proper values.

The KeyError on the other hand raises some questions. neuralynx/Cheetah_V5.6.3 is the only directory with an nvt file in the testing dataset, and according to the timestamps, the time range of the nvt file is entirely contained in the first segment. However, while reading the data, NeuralynxRawIO is still trying to read nvt signal channels in the second segment despite it not being there. It seems like we are assuming that every signal is present in every segment? Or am I missing something?

I implemented a temporary fix by duplicating the nvt data to every segment to gether with a warning message telling users that nvt file should be loaded independently if there are multiple segments. But I think a proper fix should be allowing different segments have different channels?

alyxmi avatar May 27 '24 23:05 alyxmi

That is likely what we would do. I guess I don't know NVT/Neuralynx well enough. Segments for neuralynx can be due to gaps in the signal. So my questions would be 1) are there not gaps in the nvt that we need to account for? 2) can you have behavioral input of some sort for one segment realistically and not for others? You wouldn't happen to have some easy to access docs for the neuralynx format would you?

zm711 avatar May 28 '24 17:05 zm711

Again, I'm sorry for continuosly failling the test. My pytest for some reason (potentially a bug?) isn't downloading the testing dataset properly so I have to rely on doing tests here. :(

As for your questions:

  1. I have no idea. I'm mainly doing computational tasks in the lab so I don't have a lot of experience doing experiments. The data I have access to are all single segment (nvt, ncs, etc.) per directory and the nvt file in the testing dataset also does not contain gaps.
  2. I don't think so, but the nvt file in the testing dataset does not align with the ncs files (which is why the error occured). So I don't know if this is a realistic scenario that I should account for or not. However I imagine some improper lab setup could produce data like this. Sadly, I don't have any additional doc for the neualynx format other than the one available on the official website.

Anyway, I think it is important to have more data for testing. My own data is too clean and the only one on GIN looks to be artificially constructed. I would greatly appreciate it if you guys can provide more data for me to test on. :)

alyxmi avatar May 29 '24 09:05 alyxmi

@LEGion-42,

Absolutely no worries. The G-node/gin is hard to get working unless you get lucky with the install. It works best for Linux less well on Mac and Windows. So no worries. We have the CI just for this case :) I can't get the test files on my laptop so when I'm away from my workstation I rely on the CI too!

Yeah we have a couple neuralynx users, but it's hard to get test files because often they are human data for the people we work with just means we have to deal with not leaking health data! So we do often have to make some artificial data to anonymize data appropriately. I can talk to the rest of the team to see if anyone knows the standard way these files should be. Because I could imagine that the file would provide a small amount of data and then all the other segments should just be 0s or empty values so we might need to figure out what we should do when there is null data.

zm711 avatar May 29 '24 09:05 zm711

Thank you for your understanding! Now that the tests have passed, I think this is ready for review?

alyxmi avatar May 30 '24 09:05 alyxmi

@LEGion-42, yep. We were just discussing who would be best to review. @PeterNSteinmetz has worked on this the most so if he has the bandwidth we would love his opinions. Otherwise @samuelgarcia said he would take a look as well. We are in the middle of a conference this week so next week will be the earliest for our final review. Hopefully @PeterNSteinmetz can review it!

zm711 avatar May 30 '24 09:05 zm711

So I had an initial look at this and have some questions. @LEGion-42

First, are there some test .nvt files that can be used here and committed to the ephy_testing_data?

What sort of sampling rates are typically used for these files?

There is a prior PR for having a no date and time in the header and test data for it (https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/pulls/116), though neither of these have been incorporated yet. Perhaps it is time for that test data to be pulled into the gin repository? @samuelgarcia Or I can do it myself but have been trying to follow the recommendation to not do that on your own data.

While Sam's last set of changes was done on a semi-emergent basis without test files due to an upcoming class, it really is best to have test data and tests in place when adding new features.

PeterNSteinmetz avatar May 30 '24 21:05 PeterNSteinmetz

@PeterNSteinmetz I'll have a talk with the PI tomorrow and see if I can share some of our data. However as I mentioned before our data may be too clean to be meaningful for testing. There is no segmentation or noticeable gap in timestamps in any of our data.

As for the sampling rate, since nvt files are video tracking data, the fps is typically around 30.

alyxmi avatar May 31 '24 09:05 alyxmi

@LEGion-42 . Thanks that sounds good. The test files are really just to test that headers are being read properly and records interpreted correctly. Normally we trim them down to a smaller number of records and use a small number of them. If there are no unusual features like gaps in the files, that is fine. The files can be anonymized if needed to remove any information, such as time/date, which might be identifiable to an individual person.

I agree the video tracking data should be in the same segment as per this design. Given the separate sampling frequency it should be in a separate stream.

PeterNSteinmetz avatar May 31 '24 14:05 PeterNSteinmetz

@PeterNSteinmetz I got permission to share our nvt data. However our nvt files have size ranging from 30 to 100mb, which are all outside of the 10mb file size requirement on gin and I have no idea how to trim them down. Any suggestions?

alyxmi avatar Jun 07 '24 00:06 alyxmi

@LEGion-42 Great. We don’t need an entire recording set, just perhaps 3-6 files. Since the header parsing is of primary interest, please trim to just header plus 10-20 records each.

If you are on MacOS or another Unix system you can use the ‘truncate’ command to shorten them. I usually add a ‘_trunc’ suffix to the file name before the extension to indicate this has been done.

You can compute the target size as 16384 bytes + number of records x record size. The record sizes for the various file types are available in the document at https://neuralynx.com/_software/NeuralynxDataFileFormats.pdf

if you have the ephy-testing data checked out, please create an issue at the gin site. Then add your files to an appropriately named sub folder of Neuralynx.

PeterNSteinmetz avatar Jun 07 '24 01:06 PeterNSteinmetz

@zm711 Zach, what do we need to do to get some of the pull requests for corresponding test data merged over there?

PeterNSteinmetz avatar Jun 11 '24 22:06 PeterNSteinmetz

Hey Peter. Are you talking about from the beginning? We have instructions on our repo how to make a PR on gin. If you're talking about merging test data just @ one of us and we can review it and merge it. We like to check to make sure it's not too big and that everything looks to be in order before we merge it. :)

zm711 avatar Jun 12 '24 12:06 zm711

Hi Zach, well there is this one https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/pulls/116 . That provides test data from some code that can be used to deal with some of the header issues in this pull request.

PeterNSteinmetz avatar Jun 12 '24 15:06 PeterNSteinmetz

@LEGion-42 If you would like some further assistance with getting the test files ready, just let me know.

PeterNSteinmetz avatar Jun 12 '24 15:06 PeterNSteinmetz

@PeterNSteinmetz Sorry for no being responsive, I was quite busy the past week. I'll try to upload the nvt data today.

alyxmi avatar Jun 14 '24 16:06 alyxmi

@zm711 @PeterNSteinmetz Here are the the nvt files. gin wouldnt let me pass the their captcha for some reason (maybe im a bot?). So can you guys upload them? Sorry for the trouble. nvts.zip

alyxmi avatar Jun 17 '24 15:06 alyxmi

I am in the Tetons now and can take a look at these about July 2.

Thanks, Peter

@.***

PeterNSteinmetz avatar Jun 18 '24 15:06 PeterNSteinmetz

@LEGion-42 These downloaded fine for me. I will try and structure the folder, write some tests, and start a pull request in the next few days. @zm711 We are still waiting on approval of the prior pull request for the ephy testing data https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/pulls/116 and will then need examination of the new pull request for this. This has been a very slow point historically.

PeterNSteinmetz avatar Jul 01 '24 15:07 PeterNSteinmetz

@PeterNSteinmetz, it is merged. I would ping us here on github in your PR (probably a couple times) so we review and merge it. Thanks for all your help on these PRs!

zm711 avatar Jul 01 '24 16:07 zm711

Actually @PeterNSteinmetz, that was a broken file. I had to overwrite and erase your PR (gin can be annoying). You submitted a stub on your branch rather than a file, but since I had merged it I had to delete it.

zm711 avatar Jul 01 '24 18:07 zm711

Thanks Zach. @zm711 I am going to move discussion of that dataset over to its pull request at https//gin.g-node.org/NeuralEnsemble/ephy_testing_data/pulls/116 .

PeterNSteinmetz avatar Jul 01 '24 22:07 PeterNSteinmetz

@LEGion-42,

We are still working on getting your test file onto g-node/gin. So that is in the works. There was another PR for neuralynx where we were working on simplifying the parameters moving forward. That has now been merged which has caused conflicts in your PR. Would you like to fix those conflicts while we working on adding the test files.

zm711 avatar Jul 24 '24 12:07 zm711

@zm711 I'll take a look on the weekend. Thanks for letting me know.

alyxmi avatar Jul 24 '24 18:07 alyxmi

@LEGion-42 I am about to prepare the push of the shortened .nvt files you provided. But I am wondering if a directory structure where all 5 .nvt files are in one directory would be good? This would allow testing of providing a directory to the constructor, which should then produce 5 channels, versus individual file names which should produce one.

Would such a structure provide any difficulties for your code or tests?

PeterNSteinmetz avatar Aug 05 '24 16:08 PeterNSteinmetz

@PeterNSteinmetz currently the code only supports 1 nvt files per directory. Multiple nvt files won't crash the code but will cause problems.

alyxmi avatar Sep 12 '24 20:09 alyxmi

@LEGion-42 OK, thanks, good to know. I will move them to the separate directories then before committing them.

PeterNSteinmetz avatar Sep 12 '24 20:09 PeterNSteinmetz

@LEGion-42 How would you like the Readme.txt file to read? Normally we state where the files are from and who created them.

PeterNSteinmetz avatar Sep 12 '24 22:09 PeterNSteinmetz

@LEGion-42 These files have been added to a pull request in the test data https://gin.g-node.org/NeuralEnsemble/ephy_testing_data/pulls/141. Once incorporated there, it should be possible to add some tests to the code to support this addition to the functionality. Thanks for your work on this.

PeterNSteinmetz avatar Sep 21 '24 20:09 PeterNSteinmetz