sql-ecology-lesson Inclusion of multiple tables in Introduction's data dictionary is potentially confusing

What is the problem?

At the time learners reach the data dictionary table, they'll only have imported the surveys table, so it can be confusing (even if called out explicitly in the instructions and in the far-right-hand column) that it includes columns from all three tables. I'd suggest breaking out the plots and species column information into separate tables and moving those under "Challenge: Import the plots and species tables".

Location of problem (optional)

https://github.com/datacarpentry/sql-ecology-lesson/blob/c05a678fedd5968cb520aea154df4c3fb27bf93a/episodes/00-sql-introduction.md?plain=1#L204

Feb 26 '25 18:02 dmolesUC

(happy to submit a PR if this seems worthwhile to other folks)

Feb 26 '25 18:02 dmolesUC

Hi @dmolesUC! Thanks for the suggestion. I tend to agree that the current organization is confusing and think that moving the table after the import of the other two files makes sense, perhaps with a very brief introduction explaining what the table(s) are meant to convey. One thing the current structure does that would be nice to retain in some form is indicate which fields are shared between two tables. Any thought on retaining that if the tables are split?

Feb 26 '25 18:02 adamancer

@adamancer Good observation. I think we could keep the current table structure, but limit ourselves to just showing what other tables share that field, something like:

surveys

Field Data Type Motivation Also in

day INTEGER …

plot_id INTEGER … plots

plots

Field Data Dype Motivation Also in

plot_id INTEGER … surveys

plot_type TEXT …

Field	Data Type	Motivation	Also in
day	INTEGER	…
plot_id	INTEGER	…	plots

Field	Data Dype	Motivation	Also in
plot_id	INTEGER	…	surveys
plot_type	TEXT	…

Given that we're going to go on and teach joins later in the lesson, we might even want to go a little farther and name that last column in a way that subtly calls out that plot_id sort of "belongs to" the plots table, something like:

surveys

Field Data Type Motivation References

day INTEGER …

plot_id INTEGER … plots

plots

Field Data Dype Motivation Referenced by

plot_id INTEGER … surveys

plot_type TEXT …

Field	Data Type	Motivation	References
day	INTEGER	…
plot_id	INTEGER	…	plots

Field	Data Dype	Motivation	Referenced by
plot_id	INTEGER	…	surveys
plot_type	TEXT	…

That said, right now the episode on joins doesn't really talk about things like foreign keys, so maybe that's deeper than we want to go.

It's probably something I would talk about when I was teaching the joins episode, though -- noting that the examples all use ID columns, all named the same in both tables, but you could also do something like

SELECT *
FROM surveys
JOIN plots
ON surveys.month = plots.plot_id;

even if it's nonsensical in this case.

Mar 05 '25 00:03 dmolesUC

@dmolesUC Apologies for the delay! I would lean toward "Also in" for the heading. I think it's enough at this point to highlight the shared fields without getting into what is referencing what.

If you're still interested in pursuing this, please submit a pull request and I'll be happy to look into implementing it.

Apr 02 '25 18:04 adamancer

@adamancer done! https://github.com/datacarpentry/sql-ecology-lesson/pull/386

Apr 03 '25 18:04 dmolesUC