cmdstanpy icon indicating copy to clipboard operation
cmdstanpy copied to clipboard

Feature idea: cache results of `CmdStanMCMC.summary()`

Open amas0 opened this issue 2 months ago • 2 comments

The CmdStanMCMC.summary() method runs stansummary under the hood (passing appropriate arguments to the command call). In large models with draws from thousands of quantities, computing this summary can be pretty slow. If one wants to re-use the summary results, the output dataframe needs to stored in a new variable. I think a small quality-of-life feature would be to cache the results of this call within the fit object so in code where .summary() is called more than once on a single fit, we don't re-run the summary (this is something I run into quite often in models I run).

In comparison to the loading we do of draws into memory when accessed via the fit object, the extra memory cost of also storing the summary would be minimal.

Should be an easy implementation if this is something we want to do.

amas0 avatar Dec 04 '25 15:12 amas0

This would get messy if users changed the percentiles arguments between calls, I think. Same if we exposed the --include_param argument for filtering down to specific variables (which I thought we already did... whoops)

WardBrian avatar Dec 04 '25 15:12 WardBrian

Yeah I had considered this -- I don't think it would be too much of an extra lift to store the arguments of the most recent summary call and recompute if they are not identical? My guess is the majority of interactions are just calling summary with default args. At least that's how I use it in 99% of cases.

amas0 avatar Dec 04 '25 15:12 amas0