Unused WALs may never be GCed
I suspect their may be a failure case where unused write ahead logs are never GCed. The Accumulo GC get the list of write ahead logs from zookeeper. Tablet servers create new write ahead logs in DFS and then advertise them in Zookeeper. If a tablet server dies between creating a WAL in DFS and advertising it in ZK, then that WAL may never be GCed.
Its possible the GC could periodically (like once a day) do the following.
- Get list of all WALs in HDFS
- Get list of all WALs in ZK
- Get list of live tservers
- Delete WALs that are in HDFS but not in ZK and the tserver is dead.
The reason I suggested doing this infrequently is to avoid extra load on DFS when this will normally find nothing.
I noticed this while looking into #949 and #1005
You might want to check that Network Time Protocol is installed across the cluster properly.
Closing out as no activity in over 3 years, it can be reopened if this is still a problem.