bigmetadata icon indicating copy to clipboard operation
bigmetadata copied to clipboard

Fix Names and Descriptions for Brazil, Canada, and Australia

Open michellemho opened this issue 7 years ago • 7 comments

The naming of variables for Australia, Brazil, and Canada is unreadable. The names and the descriptions need to be fixed. For example, try to find total population in Australia. There is "Total (Persons)" and then several choices for "Persons Total Total". The names are terrible and the descriptions do not clearly explaining the differences.

John had an external contractor to write the metadata for Australia, Brazil, and Canada, and I never worked with them or checked their work until now.

michellemho avatar Jun 14 '17 20:06 michellemho

When I brought in France and the European Union, I tried to "automate" the process based on existing metadata... The results are not ideal. For example, there are two "Number of Tax Households" for France, and there's no indication how they are different. Actually, they are probably duplicates. One came from the declared income table and the other came from the disposable income table. I believe something similar happened for Australia, Brazil, and Canada; it's really hard to write an automated process to write the metadata for census variables.

michellemho avatar Jun 14 '17 20:06 michellemho

@michellemho was anybody else than John in contact with that contractor? Shouldn't we tell them the issues with their data and ask them to improve it? Although "the data is there", if it's not usable I think that we should tell them about it so they can improve it.

juanignaciosl avatar Jun 15 '17 05:06 juanignaciosl

@juanignaciosl, I think only John was in contact with the contractor (SparkGeo). He told me that our point of contact is Will Cadell and that he emailed Stuart and Will to connect them. I can reach out to talk with Will and his team (Steve and Alain were contributors) about how they approached the ETL process and how we can make it better. Those guys were on the #obs-etl-help Slack channel asking questions, but the history of that channel seems to be gone. To be honest, it might be faster and easier if I tried fixing these variables myself. But I'd like to talk with SparkGeo anyway to learn whether there was any thought-process behind which tables, variables, and geographies were included and what they thought the pain points of the ETL process were.

michellemho avatar Jun 15 '17 15:06 michellemho

I'll review those schemas and the related ETL ASAP in order to understand what should we do, but it won't happen as soon as I'd like.

  1. How urgent is it?
  2. Is there anything that I can do to help you with this right now?

juanignaciosl avatar Jun 15 '17 16:06 juanignaciosl

  1. I think it's urgent. I've gotten at least three requests in the last week from within CARTO about the poorly named variables.

  2. I just need a go-ahead OK that I should try and fix this!

michellemho avatar Jun 15 '17 18:06 michellemho

  1. Ok. It would be great if you channel those requests through Github issues. That way we can know what people need.
  2. I won't reject any help coming from you :-) If you can and it's actually important, please do it. Nevertheless, please keep us posted about the progress and changes. Not because command and control but in order to learn to do it. cc @ethervoid @javitonino

I guess that it means changes in the ETL at bigmetadata repo, isn't it?

juanignaciosl avatar Jun 16 '17 06:06 juanignaciosl

Great! I'll write up and link the three requests I got.

https://github.com/CartoDB/bigmetadata/issues/167 https://github.com/CartoDB/bigmetadata/issues/172

michellemho avatar Jun 16 '17 14:06 michellemho