sfcurve icon indicating copy to clipboard operation
sfcurve copied to clipboard

Import GeoWave Indexing Code

Open lossyrob opened this issue 7 years ago • 7 comments

Supersedes #18

This PR imports the GeoWave "index" subproject into SFCurve. With this, representatives of all of the GeoWave, GeoMesa and GeoTrellis indexing code will be in this repository. GeoTrellis plans to move to using this indexing by default for it's 2.0 release.

All namespaces have been moved to org.locationtech.sfcurve.geowave.

With all of our indexing code under one roof, we can then move to the phase where we try to bring it all together under one-api-to-rule-them-all. Pie in the sky? Maybe ☁️

The code in this PR is set for IP review by an Eclipse CQ https://dev.eclipse.org/ipzilla/show_bug.cgi?id=14322

lossyrob avatar Sep 22 '17 22:09 lossyrob

https://dev.eclipse.org/ipzilla/show_bug.cgi?id=14322 is approved!

pomadchin avatar Mar 05 '18 20:03 pomadchin

@pomadchin sweet! Looks like there's a quick merge conflict to resolve. Can you or @lossyrob knock that out?

jnh5y avatar Mar 05 '18 20:03 jnh5y

@jnh5y sure!

pomadchin avatar Mar 05 '18 20:03 pomadchin

@jnh5y probably it's ready for your review

pomadchin avatar Mar 06 '18 16:03 pomadchin

I'm rethinking our approach here.

GeoWave's initial IP contribution has been approved by the IP team at Eclipse, so the IP considerations are no longer an issue. However, in the meantime this PR has gotten pretty stale as GeoWave's geowave-core-index project, which this pulls from, has had several commits in the meantime.

The original purpose of this PR was to get GeoWave indexing code into SFCurve so we could A. rely on the geowave indexing code in GeoTrellis and B. gain a dependency on SFCurve from GeoTrellis, which would push the SFCurve story along.

Considering the effort it would take to keep this SFCurve GeoWave code with the geowave-core-index project, it seems like it might be smarter to forgo (B) and accomplish (A) by simply relying on the geowave project in GeoTrellis. I'm of the mind to close this PR and move GeoTrellis to rely on geowave-core-index for this functionality.

Thoughts? @echeipesh @rfecher @jnh5y

lossyrob avatar Mar 19 '18 10:03 lossyrob

Agreed, it really only works if each of the projects are depending on the same baseline, or else it just ends up being divergent forks. To that end, geowave needs to change its dependency to sfcurve, or geotrellis needs to change its dependency to geowave-core-index - I'm game for either although currently the least moving parts and easiest for us in the short term will be to just support geowave-core-index with the idea that you can load up geowave's issues all the same with index concerns. We can evolve that fairly easily at any point if it ends up making sense to move it over to sfcurve. Or do you think we should have more immediate plans to move it over?

there's a significant change coming in geowave-core-index immediately following our 0.9.7 release, later this week. I talked to @echeipesh about it a couple months ago. It also will involve packages changing to org.locationtech.geowave.index in addition to fairly major code re-work - primarily geared around separating the well-sorted portion of an indexing approach from the specifically unsorted portion (so separate SortIndexStrategy and PartitionIndexStrategy). Previously, we had the indexing approach serve as a bi-directional map between multi-dimensional coordinates/bounds and single dimensional keys/ranges, but that was a bit too simplistic of a model because of the concerns of hotspotting (equally distributing reads/writes across nodes). With what you have here we combined the partition and sort concept together into a CompoundIndexStrategy. But the changes separating it out ends up being cleaner for hbase, accumulo, and bigtable - for example for things like keeping statistics, you end up wanting to keep a histogram of sort keys binned by partition keys. Moreover, it is really essential when the key-value store explicitly has a place for "hash" or "partition" key separated out from the "range" or "sort" key - like Cassandra and DynamoDB. For key-values stores like these it is even more helpful to keep the sorted and unsorted portions explicitly separate within the underlying "index strategy."

rfecher avatar Mar 19 '18 13:03 rfecher

I'm fine with the code moving over whenever. Generally, I'm for moving common pieces out as projects like that. Admittedly, that does require more coordination.

For the short term, it might make sense for GeoTrellis to depend on a released version of the GeoWave indexing code. Meanwhile, the GeoWave folks can sort out the changes in the pipeline; once those are ironed out, they can contribute to SFCurve. As that happens, we can cut a quick release so that GW and GT can depend on the moved code.

Does that work for everyone?

jnh5y avatar Mar 19 '18 15:03 jnh5y