census-with-r-book
census-with-r-book copied to clipboard
Speedier database import
This looks great, so happy to see it all put together!
At lunch with @dtburk, we talked about how slow it sounded to import the 1910 full count data in section 11.2.2.
I was curious and so figured I'd report back, but please don't feel like you need to add another caveat about workflow. Also, I still remain mystified by database administration, so there may be drawbacks I'm unaware of. Plus dBeaver is a nice thing to show to help people get comfortable with databases.
Those caveats aside, in my experimentation, I found chunked reading to be a lot faster than dBeaver's import. On my machine it was probably closer to 2.5 hours through dBeaver, but this takes ~15 minutes, loading only 10k rows at a time (it also works with both csv and fixed width extracts).
read_ipums_micro_chunked(
"usa_00004.xml",
IpumsSideEffectCallback$new(function(x, pos) {
dbWriteTable(
conn, Id(schema = "ipums", table = "census1910"), x, append = TRUE
)
})
)
Thanks @gergness! As your work is a significant part of the book I really appreciate your feedback.
This is an excellent suggestion and I think it's worth a mention in the book. I'm not a DBA person myself either, which is why I like including the DBeaver section as it's my go-to for browsing databases (and non-DBA users are the general target in that section). However, 2.5 hour load time is going to be unreasonable for some readers so I think your example makes a lot of sense, probably in a callout box in that section. I'll include it in the next batch of edits.