pymc icon indicating copy to clipboard operation
pymc copied to clipboard

Allow for pymc native samplers to resume sampling from `ZarrTrace`

Open lucianopaz opened this issue 10 months ago β€’ 4 comments

Description

Big PR approaching! This finishes adding the ability of pymc native step methods to resume sampling from an existing trace (as long as it's a ZarrTrace!). This means that you can now continue tuning or sampling from a pre-existing sample run. For example

with model:
    # First tuning run
    pm.sample(tune=400, draws=0, trace=trace)

    # Do whatever to decide if you want to continue tuning   
    pm.sample(tune=800, draws=0, trace=trace)

    # Switch to sampling
    pm.sample(tune=800, draws=1000, trace=trace)

Another thing is that the chunks_per_draw from ZarrTrace along with its persistent storage backends (like ZipStore or DirectoryStore) makes the sampling store the results and final sampling state periodically, so in case of a crash during sampling, you can use the existing store to load the trace using ZarrTrace.from_store and then resume sampling from there.

The only thing that I haven't tested for yet is to add an Op that makes pm.sample crash to see if I can reload the partial results from the store and resume sampling. @ricardoV94 gave me some pointers to that, but I won't be working on this for the rest of the month and I thought it best to open a draft PR to kick off any discussion you have or collect feedback

Related Issue

  • [X] Closes #7503
  • [ ] Related to #

Checklist

Type of change

  • [X] New feature / enhancement
  • [ ] Bug fix
  • [ ] Documentation
  • [ ] Maintenance
  • [ ] Other (please specify):

πŸ“š Documentation preview πŸ“š: https://pymc--7687.org.readthedocs.build/en/7687/

lucianopaz avatar Feb 21 '25 10:02 lucianopaz

Codecov Report

:x: Patch coverage is 91.61426% with 40 lines in your changes missing coverage. Please review. :white_check_mark: Project coverage is 92.92%. Comparing base (0960323) to head (440ca46).

Files with missing lines Patch % Lines
pymc/sampling/parallel.py 50.00% 16 Missing :warning:
pymc/backends/zarr.py 96.64% 9 Missing :warning:
pymc/step_methods/state.py 88.46% 6 Missing :warning:
pymc/backends/mcbackend.py 81.81% 2 Missing :warning:
pymc/sampling/population.py 91.66% 2 Missing :warning:
pymc/step_methods/metropolis.py 66.66% 2 Missing :warning:
pymc/backends/base.py 94.11% 1 Missing :warning:
pymc/backends/ndarray.py 90.00% 1 Missing :warning:
pymc/sampling/mcmc.py 95.65% 1 Missing :warning:
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #7687      +/-   ##
==========================================
- Coverage   92.94%   92.92%   -0.03%     
==========================================
  Files         116      116              
  Lines       18851    19183     +332     
==========================================
+ Hits        17521    17825     +304     
- Misses       1330     1358      +28     
Files with missing lines Coverage Ξ”
pymc/backends/__init__.py 93.75% <100.00%> (+1.06%) :arrow_up:
pymc/progress_bar.py 93.63% <100.00%> (+0.25%) :arrow_up:
pymc/step_methods/compound.py 97.88% <100.00%> (+<0.01%) :arrow_up:
pymc/step_methods/hmc/base_hmc.py 92.30% <100.00%> (+0.05%) :arrow_up:
pymc/step_methods/hmc/quadpotential.py 84.69% <100.00%> (ΓΈ)
pymc/step_methods/step_sizes.py 80.95% <100.00%> (ΓΈ)
pymc/backends/base.py 89.06% <94.11%> (+0.37%) :arrow_up:
pymc/backends/ndarray.py 80.83% <90.00%> (+0.83%) :arrow_up:
pymc/sampling/mcmc.py 91.47% <95.65%> (+0.09%) :arrow_up:
pymc/backends/mcbackend.py 97.93% <81.81%> (-1.33%) :arrow_down:
... and 5 more
:rocket: New features to boost your workflow:
  • :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

codecov[bot] avatar May 14 '25 14:05 codecov[bot]

Hello friends, I will be working on closing out this issue/PR today until it is done. I have questions for y'alls preferred review workflow:

  1. Should I open a PR against this branch? Or a separate one against main? Or other?
  2. I'm going to start with refactoring these tests to be a little bit more modular and structured. If i push these refactorings to the PR branch as I go, would you like to review them as they are pushed? Or would you like me to save all pushes until i'm done with the refactor and/or rework?f
  3. Assuming it's in scope for this PR, i'd like to complete the zarr v3 upgrade as it is completely blocking me from this feature for my setup. I plan to also incorporate appropriate error handling etc to accommodate v2, which should not be a problem since there are clear corollaries between the Store objects in each version for our purposes.

In the meantime I can push my changes to my own fork regardless, will share a link after first commit. Feel free to examine if needed.

Thanks in advance!

schlich avatar Aug 04 '25 12:08 schlich

Thanks @schlich! I think that maybe I can rebase to fix the conflicts, merge the PR and you can take it from there with a fresh new PR on top of main. I don't think that there was much left to do to get this merged and I wont be able to make significant improvements before the end of August. zarr v3 support would be a great addition. Be sure to mention #7752 when you open your PR

lucianopaz avatar Aug 04 '25 17:08 lucianopaz

@lucianopaz thanks for getting that patched up! I have it built from git in my local project, haven't tested it closely but so far so good πŸ‘

schlich avatar Aug 06 '25 17:08 schlich