esda
esda copied to clipboard
add start of local partial moran statistics
This adds my partial and auxiliary-regressive local Moran statistics forthcoming in the annals. This needs
- [x] tests
- [x] example notebook
Codecov Report
:x: Patch coverage is 96.90722% with 6 lines in your changes missing coverage. Please review.
:white_check_mark: Project coverage is 82.8%. Comparing base (051c715) to head (4f13851).
:warning: Report is 1 commits behind head on main.
| Files with missing lines | Patch % | Lines |
|---|---|---|
| esda/moran_local_mv.py | 97.3% | 5 Missing :warning: |
| esda/significance.py | 85.7% | 1 Missing :warning: |
Additional details and impacted files
@@ Coverage Diff @@
## main #279 +/- ##
=======================================
+ Coverage 82.0% 82.8% +0.8%
=======================================
Files 25 26 +1
Lines 3538 3730 +192
=======================================
+ Hits 2902 3088 +186
- Misses 636 642 +6
| Files with missing lines | Coverage Δ | |
|---|---|---|
| esda/__init__.py | 100.0% <100.0%> (ø) |
|
| esda/significance.py | 94.1% <85.7%> (-1.5%) |
:arrow_down: |
| esda/moran_local_mv.py | 97.3% <97.3%> (ø) |
:rocket: New features to boost your workflow:
- :snowflake: Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
now that the paper has hit the annals, I need to get this merged. I've resumed stubbing the tests out, and they should be simple to finalize. I'm hopeful that we can get this released in the next ESDA iteration.
I think that, after writing another two class in this frame, it would help to do a similar RFC on an api for ESDA-style interfaces that we can agree upon for the future. I think we need to start developing a more consistent interface, with attributes that have less specificity to the implementation of the class (p_sim vs. p_norm; I vs. associations_ vs. Gi, etc.)
I hope to have this RFC written for the development meeting next week.
Not sure why the CI is failing on the pygrio+geodatasets read?
OK, it looks like the geodatasets download is malformed, so the pyogrio or fiona read fails for one or more workers. When that happens, the process fails for test_moran_local_mv.py, and we get the discrepancy between the runners. Maybe a fixture will solve it?
OK, it looks like the geodatasets download is malformed
the issue is that multiple workers are trying to donwload and save the same file at the same time. We need to pull the data ahead of time to make that work as we do in lib https://github.com/pysal/libpysal/blob/bef1e81aa35c9372e91a142e9fa5c8c0c44c55ee/.github/workflows/unittests.yml#L64-L76
I think that's one useful solution! I think I would rather do this as a pytest.fixture, however, since that allows us to specify the scope of the fixture, and for pytest then to understand that scope.
:param scope:
The scope for which this fixture is shared; one of ``"function"``
(default), ``"class"``, ``"module"``, ``"package"`` or ``"session"``.
This parameter may also be a callable which receives ``(fixture_name, config)``
as parameters, and must return a ``str`` with one of the values mentioned above.
See :ref:`dynamic scope` in the docs for more information.
Using the module-level fixture, this passes now.
We should probably define the datasets you're linking to in libpysal/tests/__init__.py as pytest.fixtures at the package level.
Yeah, that is one option. I am not a big fan of it as tests with a ton of fixtures are pain to debug if you want to re-run those in ipython or a notebook but it is one way of solving this.
OK! I'm fine with that, too. I like fixtures, since they can encapsulate both the parametrize step and can compose/depend correctly across parallel jobs. So, fine to keep what's in libpysal anyway, if we're cool using fixtures here.
So, fine to keep what's in libpysal anyway, if we're cool using fixtures here.
I'm fine with either, I just find working with fixtures a bit opaque occasionally and a long sequence of them is super pain to replicate in a notebook.
OK, this should be ready to review in full/merge. The example notebook is complete.
This commits the estimators in the same convention as the existing Moran estimators, but uses a slightly different convention for describing the statistic outputs. I would like to use that as the basis for new classes in esda, and hope to have that by the next dev meeting (again 😄 )
Tests were all passing in 29b231, and new failures are in unrelated code in topo. Not sure the origin yet.
Tests were all passing in 29b231, and new failures are in unrelated code in topo. Not sure the origin yet.
Incompatibility of shapely with numpy 2.1 released yesterday.
OK, this should be ready to review in full/merge. The example notebook is complete.
This commits the estimators in the same convention as the existing
Moranestimators, but uses a slightly different convention for describing the statistic outputs. I would like to use that as the basis for new classes inesda, and hope to have that by the next dev meeting (again 😄 )
I'm looking foward to the discussion. One question is how closely the plan is to track scikit-learn conventions, as the implementation in moran_local_mv.py does not follow the "parameters with trailing underscore _ are not to be set inside the __init__ method" rule.^sk_
A point for discussion at the dev meeting.
This is looking really nice!
I've updated the API to follow that resource (I think?). Only configuration options are set in the init.
When I mentioned using this PR as a prototype for a new spatial statistics API, I meant exactly this!
One thing I'd like feedback here is how much data to cache. Here, for example, in the Partial_Moran_Local() class, I'm caching three (n_samples, n_features) matrices (self.D, self.R, self._partials_), plus the (n_samples,n_permutations,n_variables) matrix self._rlmos_).
Generally, sklearn classes avoid caching the input data if possible, but this means I'd need to remove the properties and just assign directly the attributes we want to keep. For naming, I'm thinking:
association_/associations_: the global/local measure of spatial association that the statistic expresses.labels_: the "label", if any, for common local statistics. Here, I'm thinking just the quadrant for Moran-style local covariance statistics, or hotspot/coldspot for the Getis-Ord statistics.significance_/significances_: local p-values that are generated according to an option passed in the__init__(). These will generally be calculated in thefit()object, if we want to avoid caching data on the object itself.reference_distribution_: if permutation inference is used and the random values are cached, this should be where the simulations are stored.partials_: for any cross-product style statistic, this should record the two vectors that are crossed before calculating a global statistic, or elementwise-multiplied before calculating a local statistic. This is to facilitate visualization later, as well as allow for new labelling in a.transform()method.
Also, not quite sure how to handle the fact that you have one statistic for the global but many for the local? I would like to have one name for both, but if feels wrong to use the plural for the global. I could see using the singular (association_, significance_) for everybody, but then labels_ sticks out as odd to me.
Hmm... there are yaml parsing errors here---did I fail to resolve a merge conflict correctly?
yes, I think I did. fixing...
This is also now passing and I've taken a proofread of the notebook.
Could @knaaptime @jGaboardi @martinfleis re-review?
@ljwolf Did you verify the docs build locally?
@jGaboardi I've done so now. We talked about the following in the call:
- [x] switch to
CamelCase - [x] proofread/edit notebook
- [x] move to singular nouns (
association_,significance_,reference_distribution_) when these would look strange for univariate statistics, but keep plurals (partials_,labels_) for others - [x] add to
__init__.py - [x] render docs and update gallery
But, I've now started moving the implementation to new significance engine from #281, so I will need to update the tests. Then, this is ready to ship!
@ljwolf ping me after updating tests and I will give a final look as soon as I am able.
For some reason, the W and the Graph are not the same! I'm investigating... most tests pass. I'm investigating...
The graphs are not the same because the default in Graph.build_contiguity() switched from Rook to Queen weights over the time this was developed. Fixed now.
awesome work @ljwolf !
OK! should be fine to review now. Thanks for your patience!
Time to merge?
Yes please!