areal icon indicating copy to clipboard operation
areal copied to clipboard

Add Pycnophylactic Function

Open bransonf opened this issue 4 years ago • 10 comments

Function is not ready for merge to master, but opening PR for discussion/further development. This is on a separate branch of my fork, as not to conflict with #27

This PR will resolve #1 as well

Sorry if I haven't seemed active lately. It took me about a week to understand and implement Tobler's Pycnophylactic method. It's pretty rad though!

Same considerations, we need to handle errors for user input, edge cases, etc. I still need to verify that this is the absolute correct implementation of this algorithm.

We will need to add dependencies for raster and fasterize but these are imminent anyway for an implementation of the 3-class dasymetric regression.

Still investigating ways to speed this up, although it is already faster than the pycno package. Also need to add a convergence parameter, likely stop <- max(m) * 10^(-converge) where max(m) is the largest value in the matrix.

TO DOs

  • [ ] Handle NSE or Remove
  • [ ] Speed up by setting Immutable Indices (Rather than computing in each for)
  • [ ] Implement Convergence Parameter
  • [ ] Add Verbose for diagnosis/stats on convergence?
  • [ ] Handle Errors in User Input
  • [ ] Check for Edge Cases
  • [ ] Better Documentation
  • [ ] Full Test Coverage

bransonf avatar Jun 09 '20 02:06 bransonf

nice @bransonf!

chris-prener avatar Jun 09 '20 18:06 chris-prener

is there a problem with returning NaN values?

chris-prener avatar Jun 14 '20 17:06 chris-prener

@chris-prener Short answer: yes.

Long answer: In the edge case that we divide by 0, NaN is produced. When we correct based on NaN every cell in the source inherits NaN. The smoothing function relies on the mean of adjacent cells, and inevitably a NaN gets introduced and all the cells of that source become NaN as well. In a sufficient number of iterations, all of the data becomes NaN. This will also ruin the arithmetic to check for convergence, meaning the while statement never breaks.

It's a very specific edge case, but one that results in an infinite loop.

bransonf avatar Jun 15 '20 17:06 bransonf

Gotcha... what if we assign a population of .1?

chris-prener avatar Jun 19 '20 19:06 chris-prener

@chris-prener This is the correction factor, so the only actual solution is correct <- ifelse(is.nan(correct), 0, correct)

The real issue here is actually if some data had a negative value originally, in which case it would lead to the infinite loop. Basically a serious punishment for a not-unlikely data error. (We could add an error handler for supplying negative populations, but there could theoretically be an edge case still not handled by this alone)

bransonf avatar Jun 19 '20 19:06 bransonf

I think error handling is the way to go then... though I'd like to know how it would be possible for the error handling to miss it?

chris-prener avatar Jun 19 '20 23:06 chris-prener

@chris-prener So there are two possible edge cases. (I'm sparing some technical details here)

There is one in which there are negative populations. We can catch this with an error because there shouldn't be negative populations as a matter of theory. If all the cells of a source end up negative after the smoothing process, they will become zeros. Hence, the sum of the cells in the source will be 0, and produce NaN.

But, there is another case in which the user provides populations that are 0, which is valid as a matter of theory. So we can't catch these with an error handler. In the correct set of conditions (Namely, this region is surrounded by more 0 sum regions) all the cells of the source will remain zeros. Same issue, division by zero.

bransonf avatar Jun 20 '20 21:06 bransonf

We could add an argument for 0 handling - by default it errors, but allow users to explicitly override

chris-prener avatar Jun 22 '20 17:06 chris-prener

@chris-prener I don't think there's any reason the user should have to intervene. 0s are perfectly valid from a theoretical perspective. Besides that, it's a very specific situation containing 0s that would trigger the edge case.

I've committed changes that prevent the edge case from producing an error.

bransonf avatar Jun 22 '20 23:06 bransonf

OK sounds good - happy to follow your lead on this!

chris-prener avatar Jun 22 '20 23:06 chris-prener