docs icon indicating copy to clipboard operation
docs copied to clipboard

Suggestion: Skip the Data Cleaning by using Artifacts

Open ParadaCarleton opened this issue 2 years ago • 4 comments

The big block of data-cleaning operations at the start of the tutorials kind of breaks up the flow, and IMO makes the tutorials seem more confusing. Maybe we should pre-clean these datasets, then include these cleaned datasets as (lazily installed, since most users won't want them) artifacts in Turing, which would let us skip the cleaning steps?

ParadaCarleton avatar Aug 20 '21 22:08 ParadaCarleton

I kind of like including them just because they show the whole workflow -- and in some cases the cleaning matters a great deal. Plus, adding artifacts complicates an already complex workflow.

cpfiffer avatar Aug 20 '21 23:08 cpfiffer

I kind of like including them just because they show the whole workflow -- and in some cases the cleaning matters a great deal. Plus, adding artifacts complicates an already complex workflow.

I don't disagree that it's an important part of the workflow, just that I think it's probably best to have tutorials for cleaning data separate from tutorials for things like, say, Gaussian processes. We can include links in the introduction to tutorials on things like MLDataUtils and DrWatson. Ideally, every tutorial should focus on one topic, and do it well, so that users can find tutorials that quickly go over what they don't know, instead of mixing it with subjects they've already learned. For instance, the Stan manual rarely includes data cleaning; they're usually narrowly focused on a single specific topic. We can include a link to another tutorial at the top of the introduction. As for artifacts, I don't believe loading them should be especially difficult -- from the user end, the code should just look something like:

using Pkg.Artifacts
dataset_path = artifact"dataset"

ParadaCarleton avatar Aug 21 '21 02:08 ParadaCarleton

Good compromise could be putting those setup codes in collapsible code chunks (maybe collapsed as default)? Same for e.g. the full manifest that's at the bottom of the tutorial pages.

JasonPekos avatar Apr 06 '24 03:04 JasonPekos

Good compromise could be putting those setup codes in collapsible code chunks (maybe collapsed as default)? Same for e.g. the full manifest that's at the bottom of the tutorial pages.

@shravanngoswamii, can you give this suggestion a try, too?

yebai avatar May 25 '24 10:05 yebai