shell-novice icon indicating copy to clipboard operation
shell-novice copied to clipboard

Lack of theming and exercises makes the lesson disjointed

Open smangham opened this issue 1 year ago • 3 comments

How could the content be improved?

The lesson's introduced, conceptually, as a realistic research project analysing data files. However it then almost immediately pivots into doing fairly abstract and arbitrary work on thesis.txt, extracts of Little Women, random gene sequences of fictional creatures... these files are then scattered in a bunch of subdirectories. The lesson makes very little use of the actual data.

The exercises are also quite abstract, and heavily focus on multiple choices based on "Look at this example directory tree" - not making use of the actual directory trees in the data we have them download.

I think it'd flow a lot better if:

  1. Basic shell scripts were introduced very early - possibly straight after basics like using wildcards on the command line.
  2. Then, a lot of the multiple choice exercises could be replaced with 'write a shell script that...' which used the actual data directories in the material - so people can poke around and explore to find the answer if they don't know.
  3. Tools were then introduced with use cases for the actual data - e.g.
    • Using 'find' to get a subset of files
    • Using grep to extract a particular ID/date/time of record from that file
    • Using cut to select a particular column
    • Using loops to repeat this for a particular set of parameters
    • Using shell script inputs to allow the user to specify the column

There's a lot of use of wc, sort, head -n and tail -n but I don't think they're that likely to be part of real pipelines. If selecting specific lines is required then sed -n is the realistic option, whilst head and tail should be introduced for their typical uses of peeking at files.

smangham avatar Oct 30 '24 10:10 smangham

From my perspective, if a learner has never seen the shell before, let alone heard about scripting, then introducing shell scripts early would greatly increase the cognitive load for a learner. Without a mental model of the filesystem, as they are increasingly used to cloud based solutions where this is often abstracted away, writing scripts in early exercises would be a big ask in my experience. By using simple and abstract exercises, a learner doesn't have to focus on the data itself but can focus on running the commands and getting used to typing commands into the prompt and interpreting the output.

Similarly, the multiple choice exercises attempt to provide formative assessment for the instructor and learners.

Again, I feel this is the goal of the lesson is to introduce novice learners to what is often a completely alien environment, not to attempt to get them writing shell scripts from the outset.

froggleston avatar Oct 30 '24 12:10 froggleston

Thanks for your feedback. A more coherent narrative could be helpful, though using data from a variety of fields is good because the software carpentry curriculum helps people develope software in a variety of fields. Shell scripts are useful, but transitioning to a command line editor takes a bit of time, and so would make learning more challenging. wc and sort are helpful in processing data. sed would require more introduction to regular expressions.

Possibly relevant further reading is Data Science at the Command Line.

bkmgit avatar Nov 12 '24 18:11 bkmgit

  1. Interesting suggestion! I agree with @froggleston that it is probably better to wait to introduce scripts as I think opening and closing the scripts to edit them and then and running them will be hard for new learners that are just getting started with the unix shell. Though it is really interesting since it would create better documentation of commands used from the start. Would really enforce documentation best practices but may be better for subsequent shell lessons, for example the shell-extras incubator lesson or other intermediate lessons that may be developed.
  2. Without switching to scripts this isn't as relevant. I will add I think the abstract nature of the exercises is on purpose. So that learners have to think through the problem first instead of trying it out. There is a mix of this in the exercises, some where learners have the data and can play with it and others where they have to think about it conceptually instead of trying it out first.
  3. I think these would be great changes. Always good to include common use cases as authentic examples in the lesson.

sstevens2 avatar Jun 18 '25 16:06 sstevens2

Just run this again and had exactly the same issues. Using data from a variety of fields: Great, absolutely agree with that. But the structure would make more sense as examples of using a technique in one context (e.g. chemistry data, survey results, population statistics), then looping it back into the framing narrative of a marine biology pipeline. I would still argue the very obviously non-research uses like mythical beasts are less accessible.

This is particularly the case in the "Finding Things" lesson, where almost all material is not even tangentially research-related (there's the animals example), and it doesn't hook back like some of the other episodes. It ends up falling entirely on the instructor to give their own examples of why any of this is useful, e.g. the very common use of sifting for specific lines in large output files, even though the framing narrative centres on a slow data-processing program. I'd be more helpful to show a use case like:

do-stats.sh > do-stats.txt
grep error do-stats.txt

Also, fair if there's consensus shell scripts should be delivered later on, but just don't think loops coming before them is consistent with typical shell use. Loops are clunky and very easy to get wrong if entered into the prompt. The material itself calls that out, but then introduces the ! syntax for repeating from history as an answer, which IMO risks promoting bad habits - anything you are repeatedly rerunning should be a script.

Perhaps my problems are coming from the context of teaching the course to new PhD students who haven't yet encountered the problems it is supposed to address, so aren't seeing grep and going "Aha!". But I don't think when we've run it for 2nd/3rd years & postdocs they've gotten more out of the haikus than they'd get from an out-of-domain example.

smangham avatar Sep 29 '25 13:09 smangham