robosat icon indicating copy to clipboard operation
robosat copied to clipboard

Implements reservoir sampler randomly sampling stream of features

Open daniel-j-h opened this issue 7 years ago • 0 comments

For #7. Work in progress.

This changeset implements a a randomized online algorithm "reservoir sampling" for randomly sampling k items from a stream of unknown n items. We can use this to randomly sample e.g. k building features in the osmium handlers without having to store all features first or doing two passes.

Tasks:

  • [ ] Hook up to osmium handlers
  • [ ] Let users pass number of samples for randomly sampling

Refs:

  • https://en.wikipedia.org/wiki/Reservoir_sampling
  • https://www.paypal-engineering.com/2016/04/11/statistics-for-software/#dipping_into_the_stream

daniel-j-h avatar Jun 13 '18 00:06 daniel-j-h