pynwb icon indicating copy to clipboard operation
pynwb copied to clipboard

[BUG] - Fix update of epoch_tags store

Open TomDonoghue opened this issue 3 years ago • 8 comments

Motivation

This addresses #1449

The way that epoch_tags was updated before didn't play well with strings, which should be a supported input. Since epoch_tags is a set, adding a string gets treated as an iterable, which then separates and adds each character.

This PR fixes this section to check the types of the tags, and deal with them if they are a string.

Note that the string processing line is a direct copy of pynwb.epoch line 47, to make sure strings are processed the same, and that epoch_tags properly supports the possible input of something like 'label1, label2'.

How to test the behavior?

There is example code that shows the problem in #1449. I check that this same code when tested against this PR works as it should.

Checklist

  • [ ] Did you update CHANGELOG.md with your changes?
  • [x] Have you checked our Contributing document?
  • [x] Have you ensured the PR clearly describes the problem and the solution?
  • [x] Is your contribution compliant with our coding style? This can be checked running flake8 from the source directory.
  • [x] Have you checked to ensure that there aren't other open Pull Requests for the same change?
  • [x] Have you included the relevant issue number using "Fix #XXX" notation where XXX is the issue number? By including "Fix #XXX" you allow GitHub to close issue #XXX when the PR is merged.

TomDonoghue avatar Jul 26 '22 19:07 TomDonoghue

Codecov Report

Merging #1521 (ca31f15) into dev (d75cc82) will increase coverage by 0.01%. The diff coverage is 100.00%.

@@            Coverage Diff             @@
##              dev    #1521      +/-   ##
==========================================
+ Coverage   90.52%   90.53%   +0.01%     
==========================================
  Files          25       25              
  Lines        2460     2463       +3     
  Branches      456      458       +2     
==========================================
+ Hits         2227     2230       +3     
  Misses        148      148              
  Partials       85       85              
Flag Coverage Δ
integration 69.87% <75.00%> (-0.01%) :arrow_down:
unit 83.96% <100.00%> (+0.01%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
src/pynwb/epoch.py 88.46% <ø> (ø)
src/pynwb/file.py 88.02% <100.00%> (+0.09%) :arrow_up:

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

codecov[bot] avatar Jul 26 '22 19:07 codecov[bot]

Thanks for the bug fix!

oruebel avatar Jul 30 '22 14:07 oruebel

Merging of this PR is currently blocked because flake8 checks are failing in file not modified by this PR. This appears to be due to tests running a newer version of flake8. I've created #1524 to fix those issues. Once that PR has been merged we can sync this PR with dev and merge.

oruebel avatar Jul 31 '22 17:07 oruebel

Looking at epoch_tags with a closer eye, I do not fully understand why this instance variable exists and whether it works as intended. As far as I can tell, epoch_tags is a cache of the unique tag values in the NWBFile.epochs table. That may be convenient but

  1. it can become out of sync, especially if a user calls nwbfile.epochs.add_interval(...) instead of nwbfile.add_epoch(...), and
  2. it is not populated when loading data from a file -- so it is only cached on write.

Separately, the update method is called on epoch_tags which works only if epoch_tags is a set, but the docval for epoch_tags allows the variable to be a tuple or a list.

@TomDonoghue are you using this field, and if so, how are you using it?

rly avatar Aug 02 '22 22:08 rly

@rly - I am not actively using this field - I stumbled onto this bug a while ago when I was exploring different ways to store event information (I ended up not actively using epochs). As far as I understand how it's working (having not gone too deep into this), I agree with your notes here that epoch_tags is a potentially useful idea, but that it's not guaranteed to be consistent or populated.

In terms of type, I'm not clear on if / how epoch_tags could end up anything other than a set in practice and calling update is done in the current version of the code in both add_epoch and add_epoch_column, so it seems to me that for this the code expects a set and the documentation is perhaps wrong?

TomDonoghue avatar Aug 03 '22 15:08 TomDonoghue

OK. Good to know, thanks.

In NWBFile.__init__: https://github.com/NeurodataWithoutBorders/pynwb/blob/d75cc825fe4aaeae18feeac3cbcb3f70651a4c09/src/pynwb/file.py#L317-L318

epoch_tags is allowed to be a tuple or list, but as you mention, the code expects a set, so at the very least we need to update this to allow only sets and rewrite the docstring.

However, given its limited usefulness and potential confusion, I am inclined to raise a warning when it is used to say that 1) this variable is not stored on disk, 2) it is not populated on read, 3) it may be incorrect if epochs are added not through methods on NWBFile, and 4) that it will be deprecated in a future version of PyNWB. @oruebel @bendichter what do you think?

rly avatar Aug 03 '22 18:08 rly

@rly if epoch_tags is supposed to be a shortcut for getting all unique tags that are present, it sorta feels like this might be better accomplished by a property, something like:

@property
def epoch_tags(self):
    return set(self.epochs.tags[:])

I think the above would address both of the concerns with the current approach? It does read tags every time it's called, but I'm assuming that the list of epochs / tags is unlikely to get so big that this becomes slow / an issue.

TomDonoghue avatar Aug 04 '22 13:08 TomDonoghue

@rly - just wanted to follow up here - is this still relevant / useful, or should I close this PR as out of date?

TomDonoghue avatar Apr 21 '23 19:04 TomDonoghue