accumulo
accumulo copied to clipboard
External Compaction Progress is inaccurate
Describe the bug When an external compactor reports progress, it reports back the number of entries written and reports that as a percentage of number of total entries.
However, when compaction jobs contain bulk import files, their number of entries value is 0. This means that compaction jobs will still total up the number of entries written and report back a percentage greater than 100%.
This renders the progress percentage to be inaccurate.
Versions (OS, Maven, Java, and others, as appropriate):
- Affected version(s) of this project: [e.g. 1.10.0] 2.1
- OS: [e.g. CentOS 7.5]
- Others:
To Reproduce Steps to reproduce the behavior (or a link to an example repository that reproduces the problem):
- Trigger an external compaction job against a table that contains files which were bulk imported
- Review the compaction-coordinator log (or monitor's external compactions#Running Compactions page) to see percentages greater than 100% being reporting while the the job status is still "In Progress"
Expected behavior The compaction progress percentage should be accurate and never report progress greater than 100%.
Additional context The number of estimated entries is coming from Bulk.FileInfo https://github.com/apache/accumulo/blob/33894e69979afc70efca448ea31fb29ac73288f3/core/src/main/java/org/apache/accumulo/core/clientImpl/bulk/Bulk.java#L107
It's likely that fixing the progress bar is a change to the bulk import code to correctly set the number of estimated entries. If the bulk Import code always sets that value to 0 then having it provides little benefit.
This could also be solved by excluding the entries written from the progress bar if they are coming from bulk import files
I can work on this
When the estimated entries is zero, the compactor process could open rfile index and sum up the entries for the range covered by the compactor. Thinking this would be a minimal change for 2.1.x. Modifying bulk import to compute the estimated entries would probably be a much larger change for 2.1.x