torchgeo icon indicating copy to clipboard operation
torchgeo copied to clipboard

Add additional tutorials

Open adamjstewart opened this issue 1 year ago • 8 comments

Issue

We are preparing for a TorchGeo tutorial at AGU and need to greatly expand our existing list of tutorials. This issue lists the tutorials that still need to be added and tracks progress towards completion.

General requirements:

  • Accessible: tutorials should require no prior knowledge of ML or RS
  • End-to-end: complete training and inference pipelines
  • Well-tested: all tutorials must be tested in CI to ensure they remain up-to-date
  • Resource-efficient: CI necessitates toy datasets that can be quickly downloaded
  • Linter-approved: all notebooks should pass our ruff style checks

Fix

The current plan is to completely rewrite all of our existing tutorials and organize them as follows:

  • [x] Getting Started: general overview of DL/RS/TorchGeo, links to all other tutorial sections #2439
    • [x] Introduction to PyTorch: datasets, train-val-test splits, PyTorch, training, evaluation #2440
    • [x] Introduction to Geospatial Data: challenges of RS data, CRS, projections, resolution #2446
    • [x] Introduction to TorchGeo: types of datasets, purpose of samplers #2457
  • [x] Basic Usage: targeted towards the ML crowd, more focused on training and evaluation #2439
    • [ ] Datasets: NonGeo/Geo/Raster/Vector/Intersection/Union, dataset splitters #2455
    • [ ] Samplers: GeoSampler, ROI, train vs evaluation samplers #2455
    • [x] Transforms: how to perform preprocessing, data augmentation, spectral indices, etc.
    • [x] Models: how to use models from timm, torchvision, and SMP, how to load pre-trained models, torch.hub, etc.
    • [x] Lightning: purpose of data modules and trainers, examples for classification, regression, etc.
    • [ ] CLI: command-line interface and experimentation, reproducibility and best practices
  • [x] Case Studies: end-to-end workflows, targeted towards the RS crowd, more focused on inference #2439
    • [ ] Land cover mapping: for agriculture #2449
    • [ ] Change detection: for building damage assessment?
    • [ ] Instance segmentation: for field boundary detection? @burakekim
    • [x] Hydrology: #960
    • [ ] Time series: after #2382 is complete
    • [ ] Any other ideas?
  • [x] Customization: how to write your own datasets and contribute them back #2439
    • [x] NonGeoDataset #2451
    • [x] RasterDataset
    • [x] Data modules: Geo, NonGeo #2452
    • [ ] Transforms
    • [ ] Models: @nilsleh
    • [ ] Trainers: #1897

In this design, there will only be 4 sections on the sidebar, but each one will expand when clicked on, listing all available tutorials. This will allow a growing number of tutorials without cluttering the docs. We will also move the tutorials above the API reference.

adamjstewart avatar Nov 19 '24 13:11 adamjstewart

i created my land cover classification barlow twins model on worldview 3 imagery for my phd thesis. did the whole codeing in torchgeo. felt so much relaxed with rochgeo doign the heavylifting.

kaushikCanada avatar Nov 21 '24 02:11 kaushikCanada

While not directly related -- I remember spending time trying to understand when and where normalization is applied to the datasets. It might not require a tutorial, but clarification in the documentation would be helpful. Let me know if there is a better place to share this suggestion.

burakekim avatar Nov 22 '24 14:11 burakekim

Re tutorial preparation: Count me in!

I am open to topics beyond land-cover mapping and can work with FTW since I have already spent some time familiarizing myself with it. I would like to focus on its instance segmentation labels and show how they can be useful in real-life applications. But, I feel the storyline might not be very striking if that is what we are going for -- likely something like: Here’s an inference tile, here are the instance segmentation masks, and some stats

cc: @calebrob6 -- not sure if the FTW folks plan to do something like this already

burakekim avatar Nov 22 '24 15:11 burakekim

I agree, we should clarify the normalization thing, you're not the only one who has told me that. Let's briefly mention that in the Lightning tutorial, and then I'll mention it in more detail in the Custom Data Modules tutorial. We have actually talked about changing the default to be no normalization (mean=0, std=1), but let's save that for 0.7.0, not 0.6.2.

Would love to have an Instance Segmentation tutorial, but first we need an InstanceSegmentationTask, which will also need to wait for 0.7.0. So not for AGU, but for future tutorials, yes please!

Any specific sections you would like to start working on? I can sign you up.

adamjstewart avatar Nov 23 '24 10:11 adamjstewart

Re normalization, for future reference: https://github.com/microsoft/torchgeo/issues/1780#issuecomment-2181369211 and https://github.com/microsoft/torchgeo/issues/1841

As for another tutorial that I can start working on right away, this could be it: Lightning: purpose of data modules and trainers, examples for classification, regression, semantic segmentation, etc.. The description seems quite open-ended -- what exactly do you have in mind for this item? My first impression is that we demonstrate how to construct trainers for different tasks and provide a high-level overview of their outputs. Or is it more about showing how the tasks are structured, down to the source code?

burakekim avatar Nov 23 '24 15:11 burakekim

The description seems quite open-ended -- what exactly do you have in mind for this item?

The Lightning tutorial should answer the following questions:

  • What is PyTorch Lightning?
  • What is a data module?
  • What is a trainer/task/thingy?
  • How do you combine them?
  • How to specify custom loggers, callbacks, etc.?

Honestly, our current Lightning tutorial isn't horrible and might be mostly sufficient. It's the other sections I'm more worried about.

No need to show source code for any tutorial except the Customization and Contributing section.

adamjstewart avatar Nov 23 '24 18:11 adamjstewart

Also https://github.com/microsoft/torchgeo/pull/1897

robmarkcole avatar Nov 27 '24 18:11 robmarkcole

#1897 is currently reliant on other code that will be released in 0.7.0, but we might be able to remove that and get it merged in time for 0.6.2.

adamjstewart avatar Nov 28 '24 09:11 adamjstewart

I'm pretty happy with the current state of our tutorials now. We'll be adding several more for TorchGeo 1.0 time series support, but we have a separate issue to track that. Let's close this now.

adamjstewart avatar Mar 20 '25 10:03 adamjstewart