accumulo
accumulo copied to clipboard
Survey which performance properties have associated metrics.
Accumulo has many configuration properties that enable runtime performance tuning. For tuning a production system it would be nice if each of these properties had the following when feasible.
- An associated set of metrics that can be used to make informed decisions about tuning the property.
- Documentation that explains for each performance property what its associated metrics are.
This already exists for some performance properties. A survey of what is missing or incomplete can help identify areas for improvement. The survey should at least identify the following for each performance property.
- Performance property name
- Existing associated metrics, if any.
- Assessment of need for additional metrics.
- Existing documentation that links the property to metrics.
I took a look at 2.1.0-SNAPSHOTS properties and opened #2271 as a result of noticing that a lot of the props that could use metrics were all thread pools. Other than that I found the following props need further exploration to see if they could benefit from having metrics to help tune them. Some of these may not make sense for metrics, I just took any that I thought may. I plan to work down the following list opening individual issues for ones where it does make sense after investigating them individually.
compaction.coordinator.compaction.finalizer.check.interval
compaction.coordinator.compactor.dead.check.interval
compaction.coordinator.message.size.max
compactor.message.size.max
gc.candidate.batch.size
gc.cycle.delay
manager.startup.tserver.avail.max.wait
manager.startup.tserver.avail.min.count
table.bloom.load.threshold
table.bulk.max.tablets
table.cache.block.enable
table.cache.index.enable
table.durability
table.file.max
table.scan.max.memory
tserver.cache.data.size
tserver.cache.index.size
tserver.cache.summary.size
tserver.files.open.idle
tserver.memory.maps.max
tserver.wal.max.age
tserver.wal.max.referenced
tserver.total.mutation.queue.max
tserver.slow.filepermit.time
tserver.slow.flush.time
tserver.server.message.size.max
tserver.scan.files.open.max