alluxio icon indicating copy to clipboard operation
alluxio copied to clipboard

Fix sync directory meta repeatedly

Open secfree opened this issue 2 years ago • 22 comments

What changes are proposed in this pull request?

Fix the issue: A file with a lastSync time newer than its directory causes the meta sync of the directory triggered repeatedly

Why are the changes needed?

Fix https://github.com/Alluxio/alluxio/issues/15265

Does this PR introduce any user facing changes?

No

secfree avatar Apr 07 '22 07:04 secfree

Hi @secfree, thanks for your contribution!

In order for us to evaluate and accept your PR, we ask that you sign a contribution license agreement. It's all electronic and will take just minutes.

alluxio-bot avatar Apr 07 '22 07:04 alluxio-bot

You did it @secfree!

Thank you for signing the Contribution License Agreement.

alluxio-bot avatar Apr 07 '22 07:04 alluxio-bot

Thanks for the pull request. I wonder if a better solution would be to track the oldest time any of the children have been synced and update the root last sync path time with that? I agree in the current solution sync is being called to often, but switching this to true as in your pull request may case the files to be synced to little, i.e. a file could become as old as 2x the sync time given that the root directory sync time could be updated just before the sync time on the file became too old, then the root directory will not be synced again for the full sync timeout.

tcrain avatar Apr 13 '22 17:04 tcrain

@tcrain thanks for your review.

I wonder if a better solution would be to track the oldest time any of the children have been synced and update the root last sync path time with that?

A good point. Just to confirm: update the directory's last sync time to the oldest time of its children when listStatus triggered, right?

switching this to true as in your pull request may case the files to be synced to little

Yes, you are right. Generally "read file" follows "list directory" in our current use cases, and "read file" updates the file's last sync time. So it is better than syncing directory meta every time.

secfree avatar Apr 14 '22 02:04 secfree

Yes, my initial thought would be for the function to return the last sync time, instead of true/false. Then the root directory will update its sync time with the oldest time of the values returned.

tcrain avatar Apr 14 '22 18:04 tcrain

Yes, my initial thought would be for the function to return the last sync time, instead of true/false. Then the root directory will update its sync time with the oldest time of the values returned.

I'm not sure how the parent sync time should be associated with the children sync time because in many corner cases these two are separate. For example I do chown on the parent dir and that's irrelevant to the children sync time. I'm thinking if we try to assume a relationship between parent sync time and children sync time, we get into the swamp of handling all parent-children scenarios.

Another thought I'm having is, if you return a timestamp from processSyncPath(), it is hard to distinguish SKIPPED vs FAILED without using hacky logic.

Due to my reasons above, I prefer using an enum for the return value. WDYT @tcrain ?

jiacheliu3 avatar Apr 21 '22 06:04 jiacheliu3

May I know if there is any update from your discussions? @jiacheliu3 @tcrain

secfree avatar Apr 28 '22 08:04 secfree

May I know if there is any update from your discussions? @jiacheliu3 @tcrain

I think let's start with defining an enum and returning that in processSyncPath() instead of a boolean. If a timestamp makes more sense, I can add that in a separate PR in the near future. @secfree Please let me know if that's confusing :)

jiacheliu3 avatar May 06 '22 01:05 jiacheliu3

May I know if there is any update from your discussions? @jiacheliu3 @tcrain

I think let's start with defining an enum and returning that in processSyncPath() instead of a boolean. If a timestamp makes more sense, I can add that in a separate PR in the near future. @secfree Please let me know if that's confusing :)

@jiacheliu3 Thanks for your reply. If using enum, may I know how will you update the "last sync time" of the directory if some files under the directory are skipped?

secfree avatar May 06 '22 02:05 secfree

May I know if there is any update from your discussions? @jiacheliu3 @tcrain

I think let's start with defining an enum and returning that in processSyncPath() instead of a boolean. If a timestamp makes more sense, I can add that in a separate PR in the near future. @secfree Please let me know if that's confusing :)

@jiacheliu3 Thanks for your reply. If using enum, may I know how will you update the "last sync time" of the directory if some files under the directory are skipped?

IMO I think the parent's last sync time should not be affected by children last sync time. For example the user performs a full sync on /dir, then after a while force-checks one individual file /dir/file(by -f). I think the parent's last sync time should not be refreshed by the child.

When a child is checked, UfsSyncPathCache.shouldSyncPath() checks parents of this child. So if the user performs a full sync on /dir, then checks /dir/file, the latter sync will be skipped correctly.

So if some files under the directory are skipped during the sync, that should count as a successful sync in InodeSyncStream.sync(), and we update the last sync time on the directory correctly.

What do you think? Did I miss anything?

jiacheliu3 avatar May 06 '22 02:05 jiacheliu3

So if some files under the directory are skipped during the sync, that should count as a successful sync in InodeSyncStream.sync(), and we update the last sync time on the directory correctly.

Then the solution will have the same effect as this PR, and it has the issue mentioned by tcrain "a file could become as old as 2x the sync time given that the root directory sync time could be updated just before the sync time on the file became too old, then the root directory will not be synced again for the full sync timeout." Right?

I think the parent's last sync time should not be refreshed by the child.

"track the oldest time any of the children" does not trigger parent "last sync time" refreshing by the child. It updates "last sync time" of the parent referring to children's "last sync time".

When a child is checked, UfsSyncPathCache.shouldSyncPath() checks parents of this child. So if the user performs a full sync on /dir, then checks /dir/file, the latter sync will be skipped correctly.

It seems that this is not implemented yet, it needs this feature https://github.com/Alluxio/alluxio/issues/15486

secfree avatar May 06 '22 03:05 secfree

So if some files under the directory are skipped during the sync, that should count as a successful sync in InodeSyncStream.sync(), and we update the last sync time on the directory correctly.

Then the solution will have the same effect as this PR, and it has the issue mentioned by tcrain "a file could become as old as 2x the sync time given that the root directory sync time could be updated just before the sync time on the file became too old, then the root directory will not be synced again for the full sync timeout." Right?

Great call! I think the trouble is from the fact that the last sync time does not consider if the sync is recursive. The user may trigger a single-path ls or a recursive one on the directory and the TS gets updated the same way. I have the feeling that we have many different scenarios to consider. The only way to make 100% sure is to list every case and we scrutinize them all. There should be <10 cases so I'd think it's still manageable and worthwhile.

I think the parent's last sync time should not be refreshed by the child.

"track the oldest time any of the children" does not trigger parent "last sync time" refreshing by the child. It updates "last sync time" of the parent referring to children's "last sync time".

Sure you can do that. I'm just a bit concerned about too many new objects flying around and about the performance. Also a bit careful reasoning is needed, when you distinguish a recursive sync and a single-path sync. You can totally go ahead and implement that, but the performance of the sync will very much impact the total performance of Alluxio.

When a child is checked, UfsSyncPathCache.shouldSyncPath() checks parents of this child. So if the user performs a full sync on /dir, then checks /dir/file, the latter sync will be skipped correctly.

It seems that this is not implemented yet, it needs this feature #15486

This is in https://github.com/Alluxio/alluxio/blob/f8755551bf777edcf101ca00ecb036586090b98a/core/server/master/src/main/java/alluxio/master/file/meta/UfsSyncPathCache.java#L103 I haven't looked at #15846 thoroughly but that can just be a bug in UfsSyncPathCache.

jiacheliu3 avatar May 06 '22 05:05 jiacheliu3

I think the parent's last sync time should not be refreshed by the child.

"track the oldest time any of the children" does not trigger parent "last sync time" refreshing by the child. It updates "last sync time" of the parent referring to children's "last sync time".

I'm just a bit concerned about too many new objects flying around and about the performance. Also a bit careful reasoning is needed, when you distinguish a recursive sync and a single-path sync. You can totally go ahead and implement that, but the performance of the sync will very much impact the total performance of Alluxio.

I also concern the performance, and I am not clear that if there are problems to use the oldest "last sync time" of the FIRST level children for recursive mode.

Now in my case, this issue(sync directory meta repeatedly) made listStatus on Alluxio have about 10x time cost than on HDFS. The performance won't be worse.

I think I can implement the the idea based on the oldest last sync time first and compare the performance. I will also check if it works in recursive mode. @jiacheliu3 is this ok?

When a child is checked, UfsSyncPathCache.shouldSyncPath() checks parents of this child. So if the user performs a full sync on /dir, then checks /dir/file, the latter sync will be skipped correctly.

It seems that this is not implemented yet, it needs this feature #15486

This is in ... but that can just be a bug in UfsSyncPathCache

Thanks for your help. I will verify it.

secfree avatar May 06 '22 06:05 secfree

@secfree UfsSyncPathCache is a Map<Path, SyncTime> where SyncTime actually just contains the last sync time and last recursively sync time. I think we can just use that as the return value of processSyncPath().

Currently processSyncPath() returns:

true -> path synced
false -> path skipped or sync interrupted or sync failed

We can change that to:

SyncTime.UNSYNCED -> skipped; also if the sync is interrupted, we can also return this
A real SyncTime where the sync and recursive-sync time are managed -> sync success, recursively or not
If the sync failed, you can define another static constant like SyncTime.FAILED=Long.MIN_VALUE if that makes it easier. It's not very easy to throw exceptions with threadpool.submit()

It may not be straightforward how to find the smallest value and update the parent last sync time accordingly. I prefer not to keep all results and sort them later, because imagine you have a flat directory with 1 million files... If you can keep an atomic variable for the parent sync time and update that in every sync job, that can be a good start.

To better avoid contentions, I randomly imagine there's some data structure like LongAdder, where the variable can have multiple versions, and those versions are merged on the getter. But you need close to zero contention when updating that, instead of using CAS. That would be an optimal data structure when you need a global min from concurrent threads. Not sure if you can use a ConcurrentHashMap<ThreadId, LocalMin> and we find the final minimal when the sync finishes.

jiacheliu3 avatar May 06 '22 08:05 jiacheliu3

I have a different idea:

  1. processSyncPath() keeps returning a boolean value
    • true -> path synced or skipped
    • false -> sync interrupted or sync failed
  2. processSyncPath() also updates "last sync time" of the path if path sync succeed. This relates to #15486
  3. InodeSyncStream.sync() checks failedSyncPathCount. If failedSyncPathCount=0, it gets the min value of "last sync time" of its children cached in UfsSyncPathCache(just iterate children without sorting).

What do you think about this? @jiacheliu3

secfree avatar May 06 '22 09:05 secfree

I found a strange point.

For the listStatus operation

  1. InodeSyncStream.sync() calls syncInodeMetadata() for the directory before calling processSyncPath() for its children
  2. syncInodeMetadata() calls syncExistingInodeMetadata() which may do prefetchChildren(), this gets the latest metadata for its direct children but does not update children's last sync time
  3. processSyncPath() checks last sync time to decide if it needs to do syncInodeMetadata() for the children

It seems that step 3 is not needed if prefetchChildren() happened at step 2, right? @jiacheliu3 @tcrain

If we can skip step 3, then we do not need to care children's last sync time.

secfree avatar May 15 '22 04:05 secfree

I updated the code of this PR. Now it updates children's meta if already prefetched. This solution can resolve this issue and should not import new issues. Please help review @jiacheliu3 @tcrain , thank you

secfree avatar May 15 '22 11:05 secfree

I have a different idea:

  1. processSyncPath() keeps returning a boolean value

    • true -> path synced or skipped
    • false -> sync interrupted or sync failed
  2. processSyncPath() also updates "last sync time" of the path if path sync succeed. This relates to listStatus should update files' lastSyncTime under the directory for HDFS #15486

  3. InodeSyncStream.sync() checks failedSyncPathCount. If failedSyncPathCount=0, it gets the min value of "last sync time" of its children cached in UfsSyncPathCache(just iterate children without sorting).

What do you think about this? @jiacheliu3

On your points:

  1. processSyncPath() keeps returning a boolean value I think a enum is much better than a boolean in being more explicit in the state. Actually even if there are 2 states, I prefer State.Success and State.Failed to a boolean.
  1. processSyncPath() also updates "last sync time" of the path if path sync succeed. I didn't find that. Could you point me to it? By the fact that I didn't find it in a quick search, that alone means the logic is too complicated and needs a refactor...
  1. InodeSyncStream.sync() checks failedSyncPathCount. I don't find that in your code change, is that still applicable?

jiacheliu3 avatar May 24 '22 09:05 jiacheliu3

I found a strange point.

For the listStatus operation

  1. InodeSyncStream.sync() calls syncInodeMetadata() for the directory before calling processSyncPath() for its children
  2. syncInodeMetadata() calls syncExistingInodeMetadata() which may do prefetchChildren(), this gets the latest metadata for its direct children but does not update children's last sync time
  3. processSyncPath() checks last sync time to decide if it needs to do syncInodeMetadata() for the children

It seems that step 3 is not needed if prefetchChildren() happened at step 2, right? @jiacheliu3 @tcrain

If we can skip step 3, then we do not need to care children's last sync time.

I think processSyncPath is on the dir, while prefetchChildren is on the children? Why do we not need 3 when we have 2? Could you elaborate?

jiacheliu3 avatar May 24 '22 09:05 jiacheliu3

On your points:

InodeSyncStream.sync() checks failedSyncPathCount.

I don't find that in your code change, is that still applicable?

Please ignore this idea, it is already outdated. I should have deleted it. Sorry for it.

processSyncPath() also updates "last sync time" of the path if path sync succeed.

I didn't find that. Could you point me to it? By the fact that I didn't find it in a quick search, that alone means the logic is too complicated and needs a refactor...

I planed to do this with #15486 together, please also ignore this.

secfree avatar May 24 '22 09:05 secfree

I think processSyncPath is on the dir, while prefetchChildren is on the children? Why do we not need 3 when we have 2? Could you elaborate?

If you turn on debug log you can find that: listStatus on a directory will

  1. prefetch the metadata of all children first
  2. Then call processSyncPath for each children concurrently

That is the idea of my latest PR. If listStatus already prefetched metadata, we can update children's metadata without checking their last sync time.

secfree avatar May 24 '22 09:05 secfree

Could you try to make the code change more explained, in both code and comments?

Thanks for your suggestions. I will try my best and go ahead if you agree with the basic idea of this PR: set sync=true when hasCache=true. Please help confirm if this makes sense based on the comments I just made. Thank you. @jiacheliu3

secfree avatar May 24 '22 09:05 secfree

As the changes and discussions of this PR have been incorporated into #16081, I close this one.

secfree avatar Sep 03 '22 10:09 secfree