sbi icon indicating copy to clipboard operation
sbi copied to clipboard

docs: add tutorial / how-to-guide for embedding time-series data, e.g., using causalCNNs

Open janfb opened this issue 9 months ago • 6 comments

We now have a bunch of new embedding nets to process sequential data, e.g., the CausalCNN or the TransformerEmbedding (see also #1499 #1494 #1512 )

It would be great to add a short how-to-guide for using these embeddings, e.g., using a well-known time-series model like the SIR or Lotka-Volterra.

See also https://github.com/sbi-dev/sbi/pull/1499#issuecomment-2743020418 for inspiration.

janfb avatar Mar 21 '25 13:03 janfb

Hello! Can I work on this please !!

satwiksps avatar Nov 06 '25 02:11 satwiksps

Hello @satwiksps , thanks for offering to work on this 🙏

To get started, please have a look at our existing how-to-guides here: https://sbi.readthedocs.io/en/latest/how_to_guide.html

and at the extended (quite long) tutorial here https://sbi.readthedocs.io/en/latest/advanced_tutorials/04_embedding_networks.html

the how-to-guide should give a short explanation and actionable code so that user can move on quickly. Let's discuss here how the guide for the embedding nets could look like.

Please have a look at our contribution workflow as well, it's here: https://sbi.readthedocs.io/en/latest/contributing.html

let me know if you have any questions :)

janfb avatar Nov 06 '25 06:11 janfb

Thanks, @janfb

I have gone through the existing SBI documentation and tutorials, including the How-to guides, Advanced tutorial on embedding networks and the Contributing guide, to understand the structure, tone, and technical workflow of the docs.

Here’s my plan for this how-to guide:

I’ll add a new Jupyter notebook under docs/how_to_guide/ titled “20_time_series_embedding.ipynb” The goal is to provide a short, runnable example showing how to handle sequential simulator outputs with the new embedding networks introduced in PRs #1499 (CausalCNNEmbedding) and #1494 (TransformerEmbedding).

The notebook will:

  • Use a simple SIR time-series simulator as the example.
  • Demonstrate how to instantiate and use both CausalCNNEmbedding and TransformerEmbedding for sequential data.
  • Integrate the chosen embedding into an NPE workflow (posterior_nn, simulate_for_sbi, train, and build_posterior).
  • Briefly discuss when to prefer each embedding type (e.g., CNN for local patterns, Transformer for long-range dependencies).
  • Follow the concise, example-oriented style of the existing how-to guides and link to the advanced embedding tutorial for deeper explanation.

I plan to include both embeddings in the same notebook, since they share the same workflow and purpose, and presenting them side-by-side will make it easier for users to compare their usage without repeating setup code. Once the notebook is complete, I’ll add it to the How-to guide index (index.rst) and verify that the docs build correctly with Sphinx.

If you approve I will be happy to continue with this work.

satwiksps avatar Nov 07 '25 15:11 satwiksps

Hi all!

  1. Unless there is a really good reason to do it, I would not use the SIR simulator for a how-to guide (it is to much boilerplate code). I would prefer to just use torch.randn.
  2. Keep the discussion to "when to prefer which embedding type) in the tutorials.

Thanks! Michael

michaeldeistler avatar Nov 07 '25 15:11 michaeldeistler

thanks for summary and plan @satwiksps, sounds good!

Good point by @michaeldeistler , given that the how-to-guide should be quite concise, we shouldn't use the SIR here. We could really just use a simulator that returns torch.randn(100) or so, to mimic the format of a time series.

We should also note that depending on the dimensionality of the data and the resulting dimensionality of the CNN or Transformer architectures, a GPU could be useful here.

and the other point by Michael about the discussion of what to use when - yes, lets not do this in the how-to-guide to keep them concise.

janfb avatar Nov 07 '25 16:11 janfb

Thanks, @janfb and @michaeldeistler for guiding me.

I will proceed with those adjustments:

  • I will replace the SIR simulator with a simple function that returns synthetic time series data using torch.randn() to keep the guide minimal and focused on the embedding workflow.
  • The notebook will demonstrate both CausalCNNEmbedding and TransformerEmbedding, showing how to integrate them with posterior_nn and NPE.
  • I will remove the discussion about when to use which embedding type to keep it concise.

satwiksps avatar Nov 07 '25 16:11 satwiksps