accumulo icon indicating copy to clipboard operation
accumulo copied to clipboard

Reconsider compaction planner default queue / group setup

Open dlmarion opened this issue 2 years ago • 2 comments

Currently compactors are started for the following groups for the ITs: default, accumulo_meta, user_small, and user_large. This is due to the default values for the planner properties. The root and metadata tables are configured by default to use the compactors in the accumulo_meta group. The default planner is configured to send compactions less than 128MB to the compactors in the user_small group. I think that compactors in the default and user_large groups are not used during the ITs.

I think we should configure the default planner to just use the default group and remove the user_small and user_large groups from the default property value. I think it might be more valuable to have 1 compactor in the accumulo_meta group for root and metadata compactions and 1 compactor in the default group as the default configuration. We can increase the number of compactors in the default group to service compactions for tables created during the ITs easily for the ITs that need it.

dlmarion avatar Dec 14 '23 13:12 dlmarion

With the changes in #4033 the default config could be single compaction service with two queues one for small and one for large compactions. All tables would be configured to use this single service by default, including accumulo.* tables.

The default compaction service config may move from the java default properties class to documentation and the accumulo.properties that ships with the tarball. If this change is made the then java code would define no compaction services by default. However tables would still have default config for selecting a compaction service. So only having to define a single default compaction service in default config might be nice.

I am not advocating for anything in this message, just thinking out loud. One thing I struggle with is trying to find a balance between simple config to get a small accumulo instance running vs whats good for large instances. I lean towards making the default config be suited to small clusters and making recommendations for config for large clusters easily discoverable. One way to achieve this is making the default compaction service config in accumulo.properties in the tar ball super simple and also provide a link in accumulo.properties in comments to documentation related to scaling accumulo that goes over how to dedicate resources for things like accumulo.root and accumulo.metadata.

keith-turner avatar Dec 14 '23 19:12 keith-turner

Issue #3981 describes moving default compaction config. So may be good to get #3981 and #4033 done and then in this issue try to work out what we want the default config to be.

keith-turner avatar Dec 14 '23 20:12 keith-turner

This was resolved by #4299

dlmarion avatar Apr 17 '24 11:04 dlmarion