accumulo icon indicating copy to clipboard operation
accumulo copied to clipboard

Unused WALs may never be GCed

Open keith-turner opened this issue 6 years ago • 1 comments

I suspect their may be a failure case where unused write ahead logs are never GCed. The Accumulo GC get the list of write ahead logs from zookeeper. Tablet servers create new write ahead logs in DFS and then advertise them in Zookeeper. If a tablet server dies between creating a WAL in DFS and advertising it in ZK, then that WAL may never be GCed.

Its possible the GC could periodically (like once a day) do the following.

  • Get list of all WALs in HDFS
  • Get list of all WALs in ZK
  • Get list of live tservers
  • Delete WALs that are in HDFS but not in ZK and the tserver is dead.

The reason I suggested doing this infrequently is to avoid extra load on DFS when this will normally find nothing.

I noticed this while looking into #949 and #1005

keith-turner avatar Mar 01 '19 18:03 keith-turner

You might want to check that Network Time Protocol is installed across the cluster properly.

jzgithub1 avatar Mar 07 '19 14:03 jzgithub1

Closing out as no activity in over 3 years, it can be reopened if this is still a problem.

cshannon avatar Dec 03 '22 15:12 cshannon