geowave icon indicating copy to clipboard operation
geowave copied to clipboard

Multi-Thread or Map/Reduce Statistics Management Tool: Add and Delete statistics.

Open rwgdrummer opened this issue 8 years ago • 0 comments

(1) Delete and recompute select statistics. The use case being that something change invalidating current statistics. It could be a data store image concern, algorithm concern, etc. (2) Recompute new statistics.

In all cases, statistics computations 'can' be selective. Data Adapters provide a list of statistics. By default, all statistics are computed. A wrapper should intercept the request for statistic IDs from the adapter, filtering out those statistics not present in the selected set.

When recomputing existing statistics, the prior image of the statistics must be removed from the statistics store.

Select query methods have already been altered to support a 'scan' callback. One such callback allows creation of a set of statistic instances.

Forcing the re-computation of all statistics for an is adapter unnecessarily costly for the most likely scenario--adding a new statistic.  The current tool ,StatsCompositionTool, recomputes all stats and works in a single threaded approach.

Options include: (1) Delete and recompute select statistics. The use case being that something change invalidating current statistics. It could be a data store image concern, algorithm concern, etc. (2) Recompute new statistics.

In all cases, statistics computations 'can' be selective. Data Adapters provide a list of statistics. By default, all statistics are computed. A wrapper should intercept the request for statistic IDs from the adapter, filtering out those statistics not present in the selected set.

When recomputing existing statistics, the prior image of the statistics must be removed from the statistics store.

Select query methods have already been altered to support a 'scan' callback. One such callback allows creation of a set of statistic instances.

A geowave-accumulo/src/main/java/mil/nga/giat/geowave/accumulo/util/StatsTool.java exists (branch GEOWAVE-218, pending merge to master) provides the basis for the improvement. The tool deletes all statistics for a given adapter and recomputes the statistics. The issues with the tool are evident:

The tool does not work in Map/Reduce.
Forcing the re-computation of all statistics for an is adapter unnecessarily costly for the most likely scenario--adding a new statistic. That being said, the tool demonstrates the use of StatsCompositionTool as a callback function for recomputation.

Branch GEOWAVE-218. also contains the Scanner Callback, critical to the support of the effort. Until branch #GEOWAVE-218. is merged, any work on this task should branch off of #249 .

rwgdrummer avatar Mar 29 '16 15:03 rwgdrummer