spykes suggested improvements to the API

This is a somewhat long feature request.

I'm really excited about these packages and I think it's a major step in the right direction for getting plotting functions of neural time-series data. However, I think the current API is severely limited, especially with respect to the features and conditions arguments.

I would love to see an API which bears more similarity to @mwaskom's seaborn.

Seaborn's plotting functionality is amazing. For pretty much all of seaborn's plotting functions, you pass in a pandas dataframe as well as the column names you want plotted against each other. Then, you can split the data into colors by passing a column name for the hue argument, or break out the plot into an array of subplots where each col/row is defined according to a different factor.

The utility of this approach is more clear looking at the factorplot examples: https://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.factorplot.html

Since I use seaborn extensively for data exploration already, wrote up a few functions that wrap the neurovis plotting functions in something more seaborn-like so you can see what I mean. Here's a notebook that demonstrates (a) getting the features and conditions into a friendly pandas dataframe and (b) plotting with seaborn style.

https://gist.github.com/neuromusic/f6fe2c5d7c1101811b5cd497b98a516c

Currently, by treating features and conditions as separate arguments, neurovis is incredibly limited. For one, the dictionary format of the conditions is not intuitive. Second, it's hard to compute complex conditions from the dictionary format that is currently used -- it seems largely limited to performing "AND" operations between factors and only filtering factors with ranges. It is better, I think, to let the user use pandas to construct factors with whatever logic they want.

In my above example, I only used "hue" since neurovis already supports this. Along with fixing #22, it would not be too hard to add a col or row factor. Then, in my example in the gist above, one could quickly plot the same plot as above but breaking out plots for correct and incorrect trials into rows like so...


trials = events_df[(events_df['GoodTrials']==True)]

# plot the PSTH
psthplot('RealCueTimes',trials,
         spike_times,nid=neuron_n,
         hue='response',
         row='correct',
         window=window,
         binsize=100)

One could also then also address #42 and/or other PSTH-like computations (e.g. gaussian kernels, poisson GLM models, etc) by setting a style argument or passing in some standardized estimation function.

See also seaborn's tsplot function https://stanford.edu/~mwaskom/software/seaborn/generated/seaborn.tsplot.html and discussion on it's future: https://github.com/mwaskom/seaborn/issues/896

Jun 30 '16 18:06 neuromusic

Wow, terrific suggestions! Our original idea behind conditions was to have the user provide whatever partitioning of the trials they prefer into unique conditions. We've only illustrated this using AND operators and ranges.

Having said that I like your idea of abandoning dict() of dict() structures and exploiting all the goodness of dataframes, for converting arbitrary trial meta data into conditions.

Thanks for the quick notebooks and demos; we are going to be a bit busy rest of this week, but will look into your suggestions in more detail very soon!

P.S. this conversation is super pertinent for a universal ephys format for task data with multiple behaviors or stimuli organized across trials.

Jun 30 '16 20:06 pavanramkumar

To be clear, I think the idea of a raster/psth plotting package breaking out unique conditions is fantastic. The tough part with the dict() approach is that it puts the burden on the devs to implement partitioning logic, rather than the user.

And yes, very relevant to ephys data formats. pandas dataframes are nice for behavioral event data because they can work very similarly to database tables, so (for example) you can maintain all of your intra-trial events in one dataframe with a 'trial-id' column and trial-level data (correct, stimulus class, etc) in another, then perform join operations as needed.

People store data in weird ways, though, often subject to quirks of data acquisition. If the data structures for behavior and neural analysis is standard, however, it's just a matter of getting the transformation right. And as you can see in my example, it only takes a few lines of code to do this in pandas. And with time this can put pressure on people to store their data in a way which makes this transformation easier.

Anyhow, no hurry on this on my end. I have things working "well enough" for me for now, but I like where you are oriented and might be able to contribute more after my PhD is done later in the summer.

Jun 30 '16 20:06 neuromusic

Sounds great! Spread the word and feel free to make more suggestions as you use the package in the mean time.

Jun 30 '16 22:06 pavanramkumar

OK, I found a nice summary of what I meant when I said "seaborn" style: Tidy Data by @hadley http://vita.had.co.nz/papers/tidy-data.pdf

Jul 12 '16 18:07 neuromusic

@neuromusic thanks for this!

i'm sold on using pandas for trial info management. before we dive headlong though, we need to think of two related issues:

use cases when trial data is not rectangular, e.g. some trial types have more attributes than others and dictionaries handle these cases more naturally
users we're aiming this at the ephys community where a typical grad student spends 50% of their time doing experiments. we'd like to have extremely low entry barriers to help them transition from matlab to python. forcing them to learn pandas might turn them away.

curious to hear your thoughts on these factors.

in any case, it's been challenging to carve out more time for this, but we'll get there soon enough.

Jul 13 '16 15:07 pavanramkumar

I'm definitely aware of the "non-rectangular" data issue. For example, in my own research, a "trial" is not my primary observation unit. Rather, subjects are making decisions based upon the a sequence of elements that are presented during a trial. For my analyses, I'm actually locking to elements, not the trial, but I need for each element I need to keep track of trial-level information (the previous elements and the subject's eventual response, etc).

I handle this in the same way that I would build a normalized database... I have one pandas DataFrame that represents trial-level information (timestamp, response, trial type, accuracy) and another that represents element information (columns: time, identity, context) with a column that has a trial-index. This acts like a foreign key, letting me join the two DataFrames when I need to "rectangularize" the data.

Continuing with the database metaphor, the other approach would be a NoSQL structure, where you simply have a list of dictionaries (one per event) and the key/value pairs fill. In my example, this requires that I know from the start whether I want a trial-level or element-level analysis.

To build a raster or PSTH, we minimally need a list of event times. That's easy. To split a PSTH into hue/row/col for some factor associated with each event time, there are basically two options: The most intuitive for someone coming from Matlab would be another list of the same length as the list of event times, as you've done with features.


times = [1.0, 11.0, 21.0]
color = ['red','blue','red']

The more flexible NoSQL approach would be a list of dictionaries, where each element in the list represents an event, the keys are attributes, and the values are the values of those attributes.


events = [
    dict(time=1.0,color='red'),
    dict(time=11.0,color='blue'),
    dict(time=21.0,color='red',weird=True),
]

Conveniently, one can readily construct a pandas dataframe from either passing in a dict of lists or a list of dicts:


# by passing in a dict of lists
df = pd.DataFrame({'times':times,'color':color})

# by passing in a list of dicts
df = pd.DataFrame(events)

In the latter case, pandas will construct a 'weird' column where most values are np.nan.

So one way to leverage pandas but lower that barrier to entry would be to accept either of these "list of dicts" or "dict of lists" forms that can readily be converted to a dataframe after they are passed in.

Jul 13 '16 16:07 neuromusic

@neuromusic we tried to address your suggestions in the latest pull request. Have a look

Aug 11 '16 19:08 hugoguh

ooo, awesome. I'll check it out soon

Aug 11 '16 20:08 neuromusic

spykes spykes copied to clipboard

suggested improvements to the API

spykes
spykes copied to clipboard