Adrien Grand

Results 139 comments of Adrien Grand

@zacharymorn @gsmiller If we try to do everything in a single PR, I worry that this will become very hard to review. I wonder if we should split by replacing...

This PR is mixing a force-merge-specific setting with natural merges. Can you give more context about the problem that you are trying to solve? Is setting `deletesPctAllowed` to a low-ish...

If I read this line correctly, it says that large segments (more than 50% the maximum segment size) shouldn't be merged unless both the percentage of deletes of the segment...

If you look at `BaseMergePolicyTestCase` and `TestTieredMergePolicy`, we actually have tests that simulate merges in order to verify that things like the maximum percentage of deletes work correctly (see `BaseMergePolicyTestCase#doTestSimulateUpdates`...

> The last two are optimization techniques not mentioned in the paper I think? To be honest I didn't read the paper recently so it's possible I diverged a bit...

> I cherry-picked your commit and pushed to this branch / PR to further explore the changes and their effect, hope that's ok. Of course! > I also tried to...

@mikemccand I'll try to do it overnight as I have a terrible uplink. FWIW the file I have locally is `enwiki-20130102-lines.txt`, not the `enwiki-20100302-pages-articles-lines.txt` file that luceneutil refers to.

Actually you don't need nightlyBench.py, you can use the standard python script. I think the following should work to test out on larger documents: - Download https://home.apache.org/~mikemccand/enwiki-20130102-lines.txt.lzma and put it...

> in the jira ticket you had suggested to use BMM for top-level (flat?) boolean query only. Do you think this will need to be fixed? I opened this JIRA...