Michael McCandless

Results 210 comments of Michael McCandless

> > With segrep, every replica can search accurate point-in-time snapshots of the index, unlike ES with its inefficient document replication today where every replica is searching "slightly" different point-in-time...

> Is segment number is the only factor to decide what replica to promote? Do we need to take shard allocations, node load ..etc in our consideration? Do Primaries handle...

> With segrep, CPU and IO consumption on Primary is much higher than Replica. Do we need to provide a new balancing algorithm for master node to take Primary shard...

+1, I've hit this limitation before too (working on Lucene's sources!). It'd be great if `forbiddenapis` had more granularity ... maybe it does and we just don't know how @uschindler?

I have not looked closely at this PR but it sounds very useful (enabling ICU transformations pre-tokenization), looks like the requested change from @uschindler was addressed, and `precommit` looks happy!...

Do you know how large your Taxonomy index is (how many unique `FacetLabel`s)? In your application, are all three arrays being allocated (`parents`, `siblings` and `children`)? That triples the memory...

These arrays are in general a crazy costly part of using taxonomy facets ... we should explore more efficient alternatives. E.g. if the Lucene user is only using a single...

> Take your time and enjoy the complexity of setting this up! ;-) LOL! OK I will try to test this @uschindler :)

OK, thank you @uschindler and @rmuir for helping me debug the tricky setup! I ran this `perf.py` using `luceneutil`: ``` import sys sys.path.insert(0, '/l/util/src/python') import competition if __name__ == '__main__':...

Also here is the heap JFR results for `base`: ``` PROFILE SUMMARY from 2423 events (total: 94219M) tests.profile.mode=heap tests.profile.count=30 tests.profile.stacksize=1 tests.profile.linenumbers=false PERCENT HEAP SAMPLES STACK 9.76% 9191M org.apache.lucene.util.FixedBitSet#() 6.93% 6527M...