sql-ecology-lesson icon indicating copy to clipboard operation
sql-ecology-lesson copied to clipboard

Inclusion of multiple tables in Introduction's data dictionary is potentially confusing

Open dmolesUC opened this issue 9 months ago • 5 comments

What is the problem?

At the time learners reach the data dictionary table, they'll only have imported the surveys table, so it can be confusing (even if called out explicitly in the instructions and in the far-right-hand column) that it includes columns from all three tables. I'd suggest breaking out the plots and species column information into separate tables and moving those under "Challenge: Import the plots and species tables".

Location of problem (optional)

https://github.com/datacarpentry/sql-ecology-lesson/blob/c05a678fedd5968cb520aea154df4c3fb27bf93a/episodes/00-sql-introduction.md?plain=1#L204

dmolesUC avatar Feb 26 '25 18:02 dmolesUC

(happy to submit a PR if this seems worthwhile to other folks)

dmolesUC avatar Feb 26 '25 18:02 dmolesUC

Hi @dmolesUC! Thanks for the suggestion. I tend to agree that the current organization is confusing and think that moving the table after the import of the other two files makes sense, perhaps with a very brief introduction explaining what the table(s) are meant to convey. One thing the current structure does that would be nice to retain in some form is indicate which fields are shared between two tables. Any thought on retaining that if the tables are split?

adamancer avatar Feb 26 '25 18:02 adamancer

@adamancer Good observation. I think we could keep the current table structure, but limit ourselves to just showing what other tables share that field, something like:

surveys

Field Data Type Motivation Also in
day INTEGER
plot_id INTEGER plots

plots

Field Data Dype Motivation Also in
plot_id INTEGER surveys
plot_type TEXT

Given that we're going to go on and teach joins later in the lesson, we might even want to go a little farther and name that last column in a way that subtly calls out that plot_id sort of "belongs to" the plots table, something like:

surveys

Field Data Type Motivation References
day INTEGER
plot_id INTEGER plots

plots

Field Data Dype Motivation Referenced by
plot_id INTEGER surveys
plot_type TEXT

That said, right now the episode on joins doesn't really talk about things like foreign keys, so maybe that's deeper than we want to go.

It's probably something I would talk about when I was teaching the joins episode, though -- noting that the examples all use ID columns, all named the same in both tables, but you could also do something like

SELECT *
FROM surveys
JOIN plots
ON surveys.month = plots.plot_id;

even if it's nonsensical in this case.

dmolesUC avatar Mar 05 '25 00:03 dmolesUC

@dmolesUC Apologies for the delay! I would lean toward "Also in" for the heading. I think it's enough at this point to highlight the shared fields without getting into what is referencing what.

If you're still interested in pursuing this, please submit a pull request and I'll be happy to look into implementing it.

adamancer avatar Apr 02 '25 18:04 adamancer

@adamancer done! https://github.com/datacarpentry/sql-ecology-lesson/pull/386

dmolesUC avatar Apr 03 '25 18:04 dmolesUC