bigmetadata
bigmetadata copied to clipboard
Fix Names and Descriptions for Brazil, Canada, and Australia
The naming of variables for Australia, Brazil, and Canada is unreadable. The names and the descriptions need to be fixed. For example, try to find total population in Australia. There is "Total (Persons)" and then several choices for "Persons Total Total". The names are terrible and the descriptions do not clearly explaining the differences.
John had an external contractor to write the metadata for Australia, Brazil, and Canada, and I never worked with them or checked their work until now.
When I brought in France and the European Union, I tried to "automate" the process based on existing metadata... The results are not ideal. For example, there are two "Number of Tax Households" for France, and there's no indication how they are different. Actually, they are probably duplicates. One came from the declared income table and the other came from the disposable income table. I believe something similar happened for Australia, Brazil, and Canada; it's really hard to write an automated process to write the metadata for census variables.
@michellemho was anybody else than John in contact with that contractor? Shouldn't we tell them the issues with their data and ask them to improve it? Although "the data is there", if it's not usable I think that we should tell them about it so they can improve it.
@juanignaciosl, I think only John was in contact with the contractor (SparkGeo). He told me that our point of contact is Will Cadell and that he emailed Stuart and Will to connect them. I can reach out to talk with Will and his team (Steve and Alain were contributors) about how they approached the ETL process and how we can make it better. Those guys were on the #obs-etl-help Slack channel asking questions, but the history of that channel seems to be gone. To be honest, it might be faster and easier if I tried fixing these variables myself. But I'd like to talk with SparkGeo anyway to learn whether there was any thought-process behind which tables, variables, and geographies were included and what they thought the pain points of the ETL process were.
I'll review those schemas and the related ETL ASAP in order to understand what should we do, but it won't happen as soon as I'd like.
- How urgent is it?
- Is there anything that I can do to help you with this right now?
-
I think it's urgent. I've gotten at least three requests in the last week from within CARTO about the poorly named variables.
-
I just need a go-ahead OK that I should try and fix this!
- Ok. It would be great if you channel those requests through Github issues. That way we can know what people need.
- I won't reject any help coming from you :-) If you can and it's actually important, please do it. Nevertheless, please keep us posted about the progress and changes. Not because command and control but in order to learn to do it. cc @ethervoid @javitonino
I guess that it means changes in the ETL at bigmetadata repo, isn't it?
Great! I'll write up and link the three requests I got.
https://github.com/CartoDB/bigmetadata/issues/167 https://github.com/CartoDB/bigmetadata/issues/172