cancensus
cancensus copied to clipboard
GPT 4 Cancensus
Is there a possibility to get something similar to https://censusgpt.com/ for cancensus?
In theory yes, that's an interesting demo. In practice I think this needs some thought.
- Accuracy is going to be an issue, and this implementation displays the SQL for people to verify that it correctly interprets the query. That's not optional, the risk that questions get misinterpreted is real. But most people using this won't be able to verify the accuracy. This also means it's asking users to understand parts of the database structure. What does the 'acs_census_data' table contain? 1 year or 5 year ACS? From what year? What geographic level does the 'city' column correspond to? Metro Area? Municipality?
- We already have a crisis of people quoting statistics without understanding what they mean. What income concept was used here? Individual income, household income, census family income, economic family income, ... ? What is included in "total" income, e.g. capital gains? There is extremely low data literacy in media and "expert" statements. Democratizing access to summary statistics is great in principle, but it needs to be accompanied by a better understanding of the underlying context. Otherwise it's just more noise.
- Ideally I would love to see something like this as a stepping stone into analysis. One option is to translate to
dplyrverbs instead ofSQL, which would facilitate this. And possibly is more readable than SQL for the non-initiated. - Linking this up to definitions of concepts, and automatically flagging issues (say with comparability of concepts), could help guide and strengthen the interpretation of results.
I will think about this more, and maybe start playing around with some ways to implement this. But it will have to wait a bit until I get less busy.