ozunconf18 icon indicating copy to clipboard operation
ozunconf18 copied to clipboard

Australian Data redux

Open Lingtax opened this issue 7 years ago • 15 comments
trafficstars

At the 2017 unconference, some of us worked on making Australian data more accessible. This led to ozflights and ozroaddeaths.

Maybe we can do more of this this year? What data are you interested in / do you think might be useful?

I found this data on Liquor and Gambling in Victoria maybe we could find things like this nationally and stitch it together?

Lingtax avatar Nov 08 '18 00:11 Lingtax

Absolutely! This sounds great! #4 discusses an idea for an Australian babynames pkg (or rather, regular names).

Some ideas:

  • ozflights and ozroaddeaths

    • Update with current data
    • Explore options for improving automation for updating them
    • expand on / improve documentation / vignettes
  • ozgambling + ozliqour

    • create an API pkg or clean up and combine current data you linked
    • potentially try and link with other data (ozroaddeaths? I'm not sure if the resolution of the data are compatible, though)
  • Explore the city of Melbourne data - or other cities around Australia?

  • Explore the datagovau package

  • Provide a central listing of existing data packages for Australia

njtierney avatar Nov 08 '18 01:11 njtierney

There's been work on building this Australian national list of open data sources: Knowledge Network It doesn't look like it has an API as such but at least you can use the search tool to check out multiple providers and datasets in one go.

peggynewman avatar Nov 08 '18 02:11 peggynewman

Ah, awesome! Thanks @peggynewman !

njtierney avatar Nov 08 '18 02:11 njtierney

Had a bit of a test and it seems data.gov.au covers some, but not all of data.vic.gov.au datasets.

Lingtax avatar Nov 09 '18 00:11 Lingtax

It would be good to develop a package that allows people to access the ABS data through their SDMX API. http://www.abs.gov.au/websitedbs/D3310114.nsf/home/absstat has more details. There is the rsdmx package that assists with this, but you need to know what data is available and how to query it.

danwwilson avatar Nov 09 '18 03:11 danwwilson

Is there enough meta-information from data.gov.au site to auto-generate a data package from nominated datasets?

i.e User nominates some datasets, get R to

  • create package skeleton
  • download the data
  • create a README from the meta info/JSON

Generating the 'ozdeaths' package could be almost a one-liner!

Extra tools for

  • pinging the datasource to see if there's been an update.

coolbutuseless avatar Nov 19 '18 12:11 coolbutuseless

I think this is a great idea. I was thinking about conducting an audit of the available open data. To report on how open it is practically. I recently requested some data from QLD and they provided it in a PDF. Supposedly that is open data

jesse-jesse avatar Nov 21 '18 02:11 jesse-jesse

I think @djnavarro agrees with you

Lingtax avatar Nov 21 '18 05:11 Lingtax

I started looking at the QLD gambling data a while ago. But we didn't get too far. https://github.com/RedNigel/Queensland-gaming-machines .

Could combine with the vic gambling data.

jesse-jesse avatar Nov 21 '18 10:11 jesse-jesse

I'd also be interested in coming up with an Accessibility score, taking a random sample of datasets from the data.gov.au websites, scoring the selected datasets and then writing a report back to Australian gov. Or maybe just do this for QLD and then report this back to the Digital Innovation Team.

The fact that the https://data.qld.gov.au/ website has a section for "developers" and not for "data scientists" . Makes me feel like QLD is missing the point a little and needs some guidance.

jesse-jesse avatar Nov 21 '18 10:11 jesse-jesse

There are 131 pdf datasets on the QLD open data portal.. !!!!

jesse-jesse avatar Nov 21 '18 10:11 jesse-jesse

The accessibility score idea is gold, @jesse-jesse

Are there standard metrics we could use, or would we need to derive these? If so, I'd be keen to lock down our criteria early and register them in some timestamped way to protect against arguments of cherrypicking/target shifting.

Lingtax avatar Nov 21 '18 10:11 Lingtax

I haven't looked for any metrics yet. I am sure we could find some. Good idea to lock them down. I think they should be able to be re-evaluated, but the re-evaluation should be transparent. the unconf could also be a good place to vet the accessibility score as well. we could do a first draft in the morning and then review it at lunch or morning tea and get the input of others.

jesse-jesse avatar Nov 21 '18 10:11 jesse-jesse

Ohhh... I got one : https://www.ands-nectar-rds.org.au/fair-tool Worth considering as a starting point, as it links to some actionable goals and there are efforts to promote these criteria; https://ardc.edu.au/planning/events/top-10-fair-data-things-global-sprint

Lingtax avatar Nov 21 '18 10:11 Lingtax

Seems like https://nationalmap.gov.au/about.html by terria.io is the latest and greatest central source, it's mostly new to me

mdsumner avatar Nov 22 '18 01:11 mdsumner