areal icon indicating copy to clipboard operation
areal copied to clipboard

Add Dasymetric Function

Open bransonf opened this issue 4 years ago • 18 comments

Function, data, and test are not explicitly ready for merge to master, but opening this PR as a way to discuss and further develop the dasymetric function.

Outstanding:

  • [ ] Implement Different Weights for Extensive (Sum vs Total)
  • [ ] Implement Intensive Calculation (Mean, intermediate area dependent?)
  • [ ] The Building Footprints are very large (6MB .rda) Move this to an external repo and make web request? Smaller sample data?
  • [ ] Review potential edge cases
  • [ ] Full pass at Documentation
  • [ ] Full pass at Testing (Contingent on test Building data)
  • [ ] Test with Other Intermediate Geometries (Land Use, etc)

bransonf avatar May 04 '20 23:05 bransonf

While using this dev branch I wasted an inordinate amount of time having not checked my intermediate shapefiles geometries that weren't POLYGON or MULTIPOLYGON.

Would adding the following check add too much overhead to aw_dasymetric?

if(any(!st_geometry_type(nyc_buildings_tidy) %in% c("POLYGON", "MULTIPOLYGON"))){
  stop('The `intermediate` shapefiles must contain only `POLYGON` and `MULTIPOLYGON` geometries. Remove other geometries before passing to aw_dasymetric')
}

charliejhadley avatar Jun 03 '20 14:06 charliejhadley

hey @charliejhadley - first, thanks for testing out the development branch. This actually might be worth exploring for the validation process more generally. Can you open an issue referring to this and referencing the ar_validate() function?

chris-prener avatar Jun 03 '20 16:06 chris-prener

@charliejhadley Thanks for trying out the dasymetric function, and thank you for the solution you provided. It does not add any considerable overhead, and certainly will alleviate some headaches for others. I went ahead and added this for all of the sf arguments.

If you run into any other problems or identify an edge case, please let us know. Thanks again, we're really excited to develop this.

bransonf avatar Jun 03 '20 19:06 bransonf

Thanks for adding this, @bransonf!

@charliejhadley - I just opened an issue on your behalf to make a more permanent switch in how we validate data!

chris-prener avatar Jun 04 '20 17:06 chris-prener

Thanks for doing all the work on this, I wanted to check it wouldn't add too much overhead before making a PR.

charliejhadley avatar Jun 05 '20 09:06 charliejhadley

thanks @charliejhadley!

chris-prener avatar Jun 05 '20 17:06 chris-prener

hey @joshdavids - this is the branch where we're discussing the technique. Have you installed from a repo before as opposed to CRAN?

chris-prener avatar Jun 05 '20 17:06 chris-prener

@chris-prener -- yes, I've installed from repos, but it's been a while, so a refresher would be great.

joshdavids avatar Jun 05 '20 17:06 joshdavids

and, thanks for having me onboard, everyone.

joshdavids avatar Jun 05 '20 18:06 joshdavids

no worries @joshdavids - you want to use remotes::install_github("bransonf/areal") since @bransonf is using a fork of the areal repo and not a branch...

chris-prener avatar Jun 05 '20 18:06 chris-prener

@chris-prener, awesome, thank you. Is there anything in particular you want me to take a stab at testing first?

joshdavids avatar Jun 05 '20 19:06 joshdavids

@bransonf - do you want to let @joshdavids know how to test the function?

chris-prener avatar Jun 05 '20 20:06 chris-prener

Sure, and thanks for testing this @joshdavids

You'll need some source sf data with an extensive field (meaning counts, rather than proportion/percentage)

You'll need an intermediate sf object. I've tested only building footprints so far, but this should work with land use (zoning) data as well.

And then you'll need a target, grid squares or hexagons for example.

That's it, and the documentation should be pretty clear as to the order of arguments. (target, source, intermediate)

If you have trouble finding any of these data, here are suggestions:

  • Source - You can use tidycensus to get population counts or some other count variable

  • Intermediate - You can find labeled building footprints here These are just the Microsoft AI Building Footprints with county/zip etc. added. It makes it easier to filter for your area. Be warned, unless you have a ton of RAM, R may fail to handle these large files.

  • Target - You can generate square/hexagon grids with the ar_tessellate function

I'm not familiar with your experience in R, so if I skipped anything important, please let me know.

bransonf avatar Jun 05 '20 20:06 bransonf

@bransonf -- all sounds good. I've had a lot of experience working with R (never writing packages, but using it daily in analysis for the last 5ish years) and a fair bit of more recent experience working with sf. The workflow you describe makes sense. Could you point me to where the documentation is housed, I'm having trouble finding it?

joshdavids avatar Jun 08 '20 16:06 joshdavids

@joshdavids Sorry for the confusion. The only documentation currently is the function annotations. ?aw_dasymetric Or you can read the source

The only required arguments are target, source, intermediate and extensive. The first 3 are sf objects, and the last is a character vector.

bransonf avatar Jun 08 '20 19:06 bransonf

@bransonf -- great, all sounds good. I'll get to testing shortly. Looking forward.

joshdavids avatar Jun 09 '20 00:06 joshdavids

hi all, sorry for the really long delay -- working on a paper for the Transportation Research Board conference due Aug. 1. Would it still be useful to you all for me to test in August when things free up?

joshdavids avatar Jul 22 '20 13:07 joshdavids

@joshdavids No worries, Chris and I have been incredibly busy with covid related work anyway. Whenever you have a chance to test this will be useful to us. There's no rush, we'll probably get back to this sometime in the fall.

bransonf avatar Jul 22 '20 16:07 bransonf