Inclusion of multiple tables in Introduction's data dictionary is potentially confusing
What is the problem?
At the time learners reach the data dictionary table, they'll only have imported the surveys table, so it can be confusing (even if called out explicitly in the instructions and in the far-right-hand column) that it includes columns from all three tables. I'd suggest breaking out the plots and species column information into separate tables and moving those under "Challenge: Import the plots and species tables".
Location of problem (optional)
https://github.com/datacarpentry/sql-ecology-lesson/blob/c05a678fedd5968cb520aea154df4c3fb27bf93a/episodes/00-sql-introduction.md?plain=1#L204
(happy to submit a PR if this seems worthwhile to other folks)
Hi @dmolesUC! Thanks for the suggestion. I tend to agree that the current organization is confusing and think that moving the table after the import of the other two files makes sense, perhaps with a very brief introduction explaining what the table(s) are meant to convey. One thing the current structure does that would be nice to retain in some form is indicate which fields are shared between two tables. Any thought on retaining that if the tables are split?
@adamancer Good observation. I think we could keep the current table structure, but limit ourselves to just showing what other tables share that field, something like:
surveys
Field Data Type Motivation Also in day INTEGER … plot_id INTEGER … plots plots
Field Data Dype Motivation Also in plot_id INTEGER … surveys plot_type TEXT …
Given that we're going to go on and teach joins later in the lesson, we might even want to go a little farther and name that last column in a way that subtly calls out that plot_id sort of "belongs to" the plots table, something like:
surveys
Field Data Type Motivation References day INTEGER … plot_id INTEGER … plots plots
Field Data Dype Motivation Referenced by plot_id INTEGER … surveys plot_type TEXT …
That said, right now the episode on joins doesn't really talk about things like foreign keys, so maybe that's deeper than we want to go.
It's probably something I would talk about when I was teaching the joins episode, though -- noting that the examples all use ID columns, all named the same in both tables, but you could also do something like
SELECT *
FROM surveys
JOIN plots
ON surveys.month = plots.plot_id;
even if it's nonsensical in this case.
@dmolesUC Apologies for the delay! I would lean toward "Also in" for the heading. I think it's enough at this point to highlight the shared fields without getting into what is referencing what.
If you're still interested in pursuing this, please submit a pull request and I'll be happy to look into implementing it.
@adamancer done! https://github.com/datacarpentry/sql-ecology-lesson/pull/386