lab.js Feature request: draw stratifically

Feature request: draw stratifically

Open jonathon-love opened this issue 5 years ago • 20 comments

hi felix,

we often want to specify, say, 20 trials, and then replicate these a number of times, say 5, and then draw randomly. that way each participant gets each trial 5 times.

at the moment we have to do this replication outside of labjs, and import the data set, but it would be nice if this were another option alongside 'draw with replacement'.

jonathon

Jan 31 '19 01:01 jonathon-love

Hej Jonathon,

great to hear from you, and thanks a lot for your suggestion! I totally see the need for this, and would be glad to implement it (I'm currently working on randomizing individual columns in a loop grid independently).

Am I right in assuming that the main change would be to vary the order of repeat-then-sample to sample-then-repeat (and shuffle)? Or would you be looking for a blocked design where you would group together different sets of stimuli, and shuffle within each group?

All of this is easy to implement, as you no doubt know -- if you have any ideas regarding how to represent this in the UI, I'd be all ears, because that's the tricky bit. I'm dreaming of a general-purpose design specification language / UI, but I'm not quite there yet ;-).

All the best, and thanks again!

-Felix

Jan 31 '19 17:01 FelixHenninger

hey,

wait, i'd describe this as repeat then sample, :P i.e. repeat the trials X times then shuffle.

neively, you could represent these different strategies as a list box:

draw without replacement
draw with replacement
draw with replacement, stratified

but it sounds like you've bigger plans in this area.

jonathon

Jan 31 '19 22:01 jonathon-love

Hej Jonathon,

thanks for your clarification! I've been mulling this over, and trying to figure out a good implementation -- I'm slow on the uptake though, this week, so if you'll bear with me once more and let me pick your impeccable sense for UI/UX, here are a couple of thoughts/questions:

If I understand your proposal correctly, the process would be Initial loop grid -> sample completely at random (so no groups or strata, right?) -> Repeat subset (to fill n repetitions) -> shuffle. Is that correct?
In lab.js, the sample and repeat functions are the same, depending on the target number of repetitions (if it's larger than the number of rows in the grid, things are repeated, or sampled otherwise). So this proposal would in effect just do a double sample, e.g. Initial grid -> sampleRows(n < N) -> sampleRows(n > N) -> shuffle. So maybe we should just offer more than one sampling step? (though I guess more than two wouldn't make sense)
I'm currently working on a feature to (optionally) shuffle some table columns independently. So another idea I had was to add a filter step, making the process grid -> shuffleColumns -> filter -> sample -> shuffleRows. This would force users to misuse the filter to implement sample, e.g. by shuffling one column that provides n values that pass the filter independently of the others (and then filtering by that value). The advantage of this would be that it might be a more general solution, and could make it easier to implement block designs (because you could add a block variable and filter by it). This idea feels like it might be more generally useful, but I'm still not entirely sure.
Finally, I was wondering whether it might be useful to make the entire sampling transparent to users, and allow them to re-jigger individual steps. I think that would be cool, but I'm not yet convinced that the advantage is worth the effort.

Do you have thoughts on this? Any ideas to help me get unstuck would be massively appreciated. Best,

-Felix

Feb 01 '19 17:02 FelixHenninger

Oh, and one more thing: All of this is, of course, already possible in code. I've implicitly assumed that we're talking about a UI addition, but if you're preparing a study, and are just looking for a snippet to do the sampling, please let me know, and I can put one together super-quick!

Feb 01 '19 17:02 FelixHenninger

haha,

i'm reading and re-reading the above, and only vaguely getting it :P but yup, randomising columns within trials is something i have to write code to do at the moment, so something i'll be looking forward to.

so i've actually tried implementing the stratified sampling in code, but hit a snag. we've got 20 trials defined in the loop, but we want a lot more trials than that, so i specify a large value in the Samples option, and my code reads it's value and replicates the 20 trials enough times to produce that many trials.

however, i don't want to draw with replacement - but labjs forces me to use replacement because it thinks i only have 20 trials. :)

i assume i can solve this by specifying the number of trials in the code, but the beauty of labjs, is that less technical folks customise the experiment later on. so being able to use that Samples option is very user friendly.

my solution for now is to tell the people customising this experiment to replicate these 20 trials in a spreadsheet or similar, before importing.

jonathon

Feb 02 '19 05:02 jonathon-love

Hej Jonathon,

sorry for the late response -- I think I'm starting to get it now (sorry, I'm especially slow on the uptake these days 🤪 This has been a long semester).

You're right that we don't currently allow oversampling without replacement -- that's mostly because I couldn't decide what that would look like: In your scenario, you'll probably want to sample a multiple of the available grid rows, but what if that's not the case? (say if there are 10 repetitions defined in the grid, but a user asks us to sample 15) Should we throw an error? Sample the last five at random? Via round-robin? Both as options?

I can't commit to a tight timeframe, but with the term break on the horizon, I think this should doable in the near future. If you could help me get a clearer picture of what the UI would look like, that'd be super-helpful!

Cheers,

-Felix

Feb 05 '19 21:02 FelixHenninger

This has been a long semester

summer break over here! we're living the dream! (actually, it has been a bit too hot).

In your scenario, you'll probably want to sample a multiple of the available grid rows, but what if that's not the case? (say if there are 10 repetitions defined in the grid, but a user asks us to sample 15) Should we throw an error? Sample the last five at random? Via round-robin? Both as options?

i think requesting 15 from 10 is a pretty marginal situation, and i'd just sample the last 5 at random - i wouldn't let something so marginal complicate the UI.

the simplest UI i can think of, is replacing the 'sample with replacement' checkbox with a listbox with the options:

sample with replacement
sample without replacement
sample with stratified replacement

cheers

Feb 05 '19 22:02 jonathon-love

summer break over here! we're living the dream!

😫😉

Ok, I think we're on the same page now, thanks! One more thing: Are you certain that stratification is the right term for this? (I think this was what threw me off originally -- I had imagined that it would involve splitting the table into groups and then sampling from each to the same degree).

Feb 05 '19 22:02 FelixHenninger

certainly not sure that stratification is the right term :)

Feb 05 '19 22:02 jonathon-love

Hej Jonathon, sorry for the long radio silence! I got somewhat side-tracked over the past weeks due to new features and associated deadlines (we have audio support now!), but I've now gotten around to implementing this feature.

screenshot_2019-02-28 builder lab js

If you have a moment, I've been ~thinking~dithering about the terms to use both in the UI and API, and if you have a moment for feedback, you could greatly speed up the process ;-) . Right now, the options (and API options) are:

No sampling (undefined)
Draw without replacement (draw)
Draw with replacement (draw-replace)
Round-robin (round-robin)
Draw without replacement, then start over (draw-repeat or shuffle-then-sample)

I'm especially uncertain about the distinction between the draw and draw-replace keys, and whether the last option is described clearly enough. (also, do people know the term round-robin? Maybe 'Go through repetitions sequentially, then start over' makes more sense?)

If you can spare a few minutes, I'd love to hear what you think?

-Felix

Feb 28 '19 19:02 FelixHenninger

hey,

i wonder if people wouldn't find Trial order more intuitive than Sampling? then rather than No sampling it could be In order?

i think 'draw without replacement' and 'draw with replacement' are reasonably well understood concepts, and are easily google-able. you could add a link beside each option if you wanted.

round robin on the other hand, i've no idea what that could be :P (and it doesn't seem to be easily google-able ... well, perhaps it is, but i use duckduckgo, it's not duckduckgo-able at least).

but this is looking good.

(i've been meaning to follow up this new 'drag columns into rows' thing too ... because i'm a little stumped as to how that works).

Mar 01 '19 03:03 jonathon-love

I wonder if people wouldn't find Trial order more intuitive than Sampling? then rather than No sampling it could be In order?

Ah, that's an interesting point! So basically right now, I think about shuffling and sampling as two seperate (and independent) options: You can choose to shuffle your loop, and then subsample it.

So you're suggesting to combine both into one, and choose between shuffling and sequential processing when the left field is left empty, and drawing (with or without replacement) or sequential processing if a number of samples is set?

(i've been meaning to follow up this new 'drag columns into rows' thing too ... because i'm a little stumped as to how that works)

Ah, thanks for your feedback! The idea is that you can shuffle groups of columns, so that the groups are shuffled independently, and the data within each group of columns is shuffled together. You can control the groups of columns by dragging them into the same row -- by default, everything is shuffled as one group. The idea is that this might be useful for creating orthogonal manipulations.

Again, if you have any ideas on how to do this, I'd love to hear them -- I did a workshop this week, and my impression was that people got it when it was explained, but it isn't immediately clear.

Mar 01 '19 18:03 FelixHenninger

So you're suggesting to combine both into one, and choose between shuffling and sequential processing when the left field is left empty, and drawing (with or without replacement) or sequential processing if a number of samples is set?

yeah, it's just an idea. coming up with terms for everything might make it more difficult.

the data within each group of columns is shuffled together

so i initially thought it meant that if A, B, C were in the same row, that these would be shuffled within the row, and then the rows would be shuffled, i.e.

	A	B	C
1	1	2	3
2	7	8	9

might become:

	A	B	C
2	8	7	9
1	3	1	2

but i assumed this wasn't the case, because by default it put everything in the same row (and shuffling all the values within a row seemed like the exception, rather than the rule). is this how it works?

Mar 02 '19 01:03 jonathon-love

Regarding the shuffling/sampling distinction: I very much like the idea of simplifying the UI here, though I wonder if there aren't edge cases where the distinction is useful, e.g. you might want to sample from all of the rows so that every stimulus is equiprobable, but then shuffle so that blocks of stimuli don't occur (every row is represented n times, but there are no restrictions on where in the final design, e.g. instead of [3, 2, 1, 1, 3, 2, 1, 2, 3] which you would get from continuous sampling until the grid is done, and then starting over, you might want [2, 1, 3, 3, 3, 2, 1, 2, 1], which is the same, but with a final shuffle step).

Sorry for bothering you with these minutae, this kind of stuff keeps me up at night.

Regarding the column groups, I think I've explained it wrong: If A, B and C are in the same row, the result is exactly the same as the overall shuffle. However, if you put A and B in one group, and C in the other, the rows in C will be shuffled independently of the rest.

In this case, you might end up with a result like this:

	A	B	C
1	1	2	9
2	7	8	3

The groups still shuffle vertically, and never swap data between columns horizontally, if that makes any sense.

Does that clarify things?

Mar 02 '19 13:03 FelixHenninger

though I wonder if there aren't edge cases where the distinction is useful

yeah, for sure.

Does that clarify things?

ah! got it. i misunderstood the purpose of it, and so i was totally seeing it through the wrong lense. i assumed it was for shuffling columns within a trial (which is something i want to do, which is why that's what i was looking for :P)

Mar 05 '19 04:03 jonathon-love

Hej, thanks! I can see why you'd want to swap data between columns; let's look at that next, shall we? 😉

Regarding the sampling options, I spent some time cleaning up the code (and squashing the massive number of commits I'd accumulated), and it's now online (as of 529cec78c66abce64227720dee94b83387028f06), and visible at http://labjs-beta.netlify.com/ . If you have a moment, I'd love to hear what you think, and whether the update makes sense to you! There's also some technical documentation around this, if you'd like to take a look.

Sorry again that this took so long, and thanks a lot for your patience!

Mar 11 '19 18:03 FelixHenninger

ah yeah, this is nice. i wasn't completely sure what sampled without replacement (in blocks), and had to go to the documentation, and i inferred that it corresponded to draw-shuffle based on it being in the third position.

i think that referring to the docs will be necessary for most people here (and i'd say that will be unavoidable, just because it's hard to sum these strategies up in few words), so it might be worth using more jargon-y labels, (i.e. draw-shuffle) so it's easier to match them up with the docs.

(or perhaps the docs could be more explicit draw-shuffle [sampled without replacement (in blocks)])

but yeah, coming up with labels for things is always a dog.

cheers

Mar 14 '19 04:03 jonathon-love

Hej Jonathon, thanks for your feedback!

i inferred that it corresponded to draw-shuffle based on it being in the third position.

This is why this discussion is super-helpful: sampled without replacement (in blocks) is the version without the final shuffle -- the idea is that it will create blocks when oversampling, e.g. [1, 2, 3, 2, 3, 1] (so all entries have to be exhausted before the first is repeated).

I totally see how all of this is annoyingly confusing, so thanks again for bearing with me! I'm now working on some inline docs in form of a hint next to the sampling UI, and I'll revise the library documentation, and add some more user-side docs, too. I think the best way of explaining this is probably by mapping use cases onto options.

As an early draft of the inline documentation, do you think the following makes sense?

Screenshot 2019-03-14 at 20 20 05

Cheers,

-Felix

Mar 14 '19 19:03 FelixHenninger

oh yeah, that's very nice. you've used if all exhausted in Sample without replacement and If oversampling in Sample w/o replacement (in blocks) ... i think if all exhausted is clearer, so maybe use that in both.

jonathon

Mar 15 '19 01:03 jonathon-love

Awesome, thanks again for your feedback! The hint is online as of d7c40013a47366600f283f0de5ac4f1715ed6386, and I'll continue to put together docs for this before we push it to release.

Cheers, -F

Mar 15 '19 14:03 FelixHenninger

lab.js lab.js copied to clipboard

Feature request: draw stratifically

lab.js
lab.js copied to clipboard