yas3fs icon indicating copy to clipboard operation
yas3fs copied to clipboard

yas3fs behavior when async background upload fails after 3 attempts

Open bitsofinfo opened this issue 10 years ago • 9 comments

Piggybacking on https://github.com/danilop/yas3fs/issues/17

You noted that ya3fs by default attempts 3 times to upload the file in the background after committing locally to the cache (and reporting to the writer/caller that the write succeeded).

However what is the behavior if the 3 retries fail? Does yas3fs delete the locally cached file?

If not, could the following options be exposed?

a) deleteCachedFileOrphanAfterUploadFile = i.e. enable purging the locally cached file if the s3 upload processed exhausted all retries

b) Some sort of option to log locally a list of all files (paths) that were written OK to the local cache, but failed to upload to S3. This would permit integrations with calling applications so they could consult this file to cleanup meta-data that now points to orphaned files (i.e. files that yas3fs said were OK (written locally) but failed to truely write to s3 in the background)

bitsofinfo avatar Apr 28 '14 15:04 bitsofinfo

I see your point, but if yas3fs is trying to upload a file to S3 it is because is a new or updated file, so deleting it means you loose all the possibilities to recover. I would prefer (after the 3rd failure) to wait for some time (e.g. 5 or 15 minutes) and then try again. What do you think?

danilop avatar Apr 30 '14 07:04 danilop

Yes I think waiting an additional (configurable) period of time before a second retry cycle would be good. So different levels of "retry cycles" with a "give up" behavior

a) retry config = numberOfUploadAttempts = N, sleepMSTime = N

b) retry cycle config = numberOfCycles = N, sleepMSTime = N

c) retryCycleExhaustedAction = {deleteCachedFileOrphan=true, uploadFailureLog=/path/to/log/file?, otherOptionB?)

So personally I would configure this such as:

a) retryConfig: { numberOfUploadAttempts = 3, sleepMSTime = 30000} // 30s b) retryCycleConfig: {numberOfCycles = 2, sleepMSTime = 600000 }// 10 min c) retryCycleExhaustedAction = {deleteCachedFileOrphan=true, uploadFailureLog=/path/to/log/file, localBackupDir=/path/to/dir/to/move/orphan/to}

Format of the uploadFailureLog might just be something as simple as listing the files that failed, and their local paths for recovery/manual/automated action in the localBackupDir

The format of this uploadFailureLog files should be pretty clean/straightforward/simple. In my use cause I would likely ingest it via something like logstash and ship it off to an event system etc.

Thoughts?

bitsofinfo avatar Apr 30 '14 13:04 bitsofinfo

Any thoughts on this idea?

bitsofinfo avatar May 12 '14 22:05 bitsofinfo

Bigger architecture change?... add a --with-plugin-file options this loads a class of YAS3FSPlugin each of these plugins are wrappers for YAS3FS methods (decorated w/ @withplugin)

this also means the methods in YAS3FS should be broken up a bit more. ie. do_on_s3, do_on_s3_now, do_cmd_on_s3_now_w_retries, do_cmd_on_s3_now. and perhaps do_delete_on_s3_now, do_copy_on_s3_now, do_set_c_from_file_on...

for this scenario the yas3fs method would be decorated

@withplugin
def do_cmd_on_s3_now_w_retries(...):
  last_exception = None
  for i in self.retries:
    try:
      do_cmd_on_s3_now(...)
      return pub
    except Exception, e:
      last_exception = e
      pass
  raise last_exception 

and the plugin would be (MyYas3fsPlugin.py)

from yas3fs.YAS3FSPlugin import YAS3FSPlugin

class MyYAS3FSPlugin (YAS3FSPlugin):
  def do_cmd_on_s3_now_w_retries(self, fn):
    def wrapper(*args, **kargs):
      try:
        return fn(*args, **kargs)
      except Exception as e:
        # do failover here.. ie...
        path = args[1][1]
        action = args[2]

        if args[1][0] = 'upload':
          cache_file = args[0].cache.cache.get_cache_filename(path)
          cache_stat = os.stat(cache_file)
          emailCacheFile(cachefile)

        return args[2] # pub

it would be run as

yas3fs ... --with-plugin-file MyYas3fsPlugin.py

ewah avatar Jun 13 '14 17:06 ewah

Whats the diff between do_on_s3 vs do_on_s3_now

sync vs async exec?

bitsofinfo avatar Jun 13 '14 17:06 bitsofinfo

do_on_s3 adds commands things to the s3 queue...

do_on_s3_now runs the commands.

ewah avatar Jun 13 '14 17:06 ewah

Yes, the _now command is executed immediately, the other one adds it to a queue for async execution.

danilop avatar Jun 16 '14 08:06 danilop

Minor, but potentially consider renaming some of the methods so the behavior is clearer by just the method name.

bitsofinfo avatar Jun 18 '14 16:06 bitsofinfo

We use yas3fs to archive files by taring them up onto the mounted filesystem and if that succeeds we remove the files. With the current behavior data loss is very likely if we lose write access to the bucket or never had it). I think there are a few opportunities for improvement. 1) Mark files in the cache (perhaps use a different directory in the cache) for pending writes then move them to the read cache after successfully written. Then try to upload them at some point in the future, at least on subsequent mount. 2) If all writes are failing stop accepting writes by returning I/O error or permission denied to the filesystem write.

I would like to add that I tried to work around this problem by using --s3-num 0. Unfortunately when I remove the bucket permissions the yas3fs mounted filesystem still accepts writes.

#while date > /mnt/testbucket/writetest; do echo -n .;sleep 5; done .........................

I would expect it to work more like this: #while date > /root/writetest; do echo -n .;sleep 5; done -bash: /root/writetest: Permission denied

bdurrow avatar Jan 01 '16 02:01 bdurrow