phoenix icon indicating copy to clipboard operation
phoenix copied to clipboard

PHOENIX-7593: Enable CompactionScanner for flushes

Open sanjeet006py opened this issue 7 months ago • 4 comments

sanjeet006py avatar Apr 29 '25 06:04 sanjeet006py

Not related to this PR, but as a general improvement, this method should not be named as isEmptyColumn() because it does not perform any empty column related check, all it checks for is whether the given cell has matching CF and CQ:

    public static boolean isEmptyColumn(Cell cell, byte[] emptyCF, byte[] emptyCQ) {
        return CellUtil.matchingFamily(cell, emptyCF, 0, emptyCF.length) &&
               CellUtil.matchingQualifier(cell, emptyCQ, 0, emptyCQ.length);
    }

We should remove the above utility because HBase CellUtil already provides exactly the same:

  public static boolean matchingColumn(final Cell left, final byte[] fam, final byte[] qual) {
    return matchingFamily(left, fam) && matchingQualifier(left, qual);
  }

(worth doing as separate Jira/PR though)

virajjasani avatar Apr 29 '25 20:04 virajjasani

(worth doing as separate Jira/PR though)

Created JIRA: https://issues.apache.org/jira/browse/PHOENIX-7597

sanjeet006py avatar Apr 30 '25 07:04 sanjeet006py

@sanjeet006py Can you also do a perf study to rule out any performance degradation that can get introduced in the flushing path. We have some metrics at the regionserver like hbase.regionserver.FlushTime and at per table like hbase.regionserver.Namespace_default_table_<TABLENAME>_metric_flushTime_95th_percentile

tkhurana avatar Apr 30 '25 16:04 tkhurana

@tkhurana @virajjasani the perf analysis is done: https://docs.google.com/document/d/1oQzEMP4LXOFxLHlKt1SZ5uvRLd3Vk90x39gn1hVBn0Y/edit?tab=t.0#heading=h.32xuccojgowv. Overall I see enabling CompactionScanner for flushes will have some overhead (as expected) but no big enough to cause performance degradation. Thanks

sanjeet006py avatar May 09 '25 06:05 sanjeet006py