triage icon indicating copy to clipboard operation
triage copied to clipboard

Cohort inspector

Open thcrock opened this issue 2 years ago • 4 comments

Just putting something together quick for inspecting cohorts.

I added pydantic here as a prototype for using it elsewhere in Triage. I think when we specify specific dict structure as arguments (which we do all over the place) or return values (which we usually don't, but this PR certainly does), we should be more specific about what that dict is supposed to contain. Pydantic makes it pretty easy to do this.

Anyway, this is just the single-date version. I wanted a check-in to see if we should keep going this direction, and maybe decide what else the interface should be. Logging all of these values maybe?

thcrock avatar Apr 16 '22 04:04 thcrock

Thanks, @thcrock -- this seems like a good start to me! The main extension from here is probably specifying the date(s) you want to pass, and one piece that seems like it could be useful to support is to (optionally) start from the temporal config and take something like the last as_of_date or all the as_of_dates in the last training or test matrix. What do you think?

I've been imagining that a use case here might be via a notebook just to provide a little more of an interface, so it might help to add an example notebook that people could start from, for instance along the lines of this one for visualize_chops.

Also, I think that makes sense to me about using pydantic for cases where we want to be explicit about the structure of something like a dict being passed. I'm not sure trying to integrate it to make things more strongly typed everywhere, but curious what your thoughts are on that balance?

shaycrk avatar Apr 19 '22 00:04 shaycrk

@shaycrk Regarding multiple as-of-dates: What would you want to see as the output in that case? I think a histogram from matplotlib would bring this more in line with visualize_chops; x axis is each as of date that you pass in, and the y axis is the # of rows on that date. If you pass in one row, it could just plot the one bar on that histogram I guess.

If not a graph, what kind of result would you be looking to see for multiple dates? Print the info for each date?

thcrock avatar Apr 29 '22 04:04 thcrock

@shaycrk By histogram I just meant bar chart, I guess. I just included an example notebook with such a bar chart. It's just hardcoded and not calling the cohort inspector right now, but if you think that's a good way to visualize it we could make something like that work.

thcrock avatar Apr 29 '22 05:04 thcrock

Thanks @thcrock (and sorry for the very slow reply!).

I think something along those lines makes sense, but might opt for a line chart, more along the lines of audition, rather than bars. Returning a small number of example entity-date pairs might also be helpful here (or just showing in the example notebook how to grab them from the resulting table) to let users double-check the logic or look at characteristics of some example cohort entities.

Certainly well beyond the scope here, but one could imagine a more fully-featured cohort inspector that makes it easy to look at crosstabs of the resulting cohort and how they vary over time.

shaycrk avatar Jun 21 '22 21:06 shaycrk