amazon-glacier-cmd-interface icon indicating copy to clipboard operation
amazon-glacier-cmd-interface copied to clipboard

Filesystem support using fuse

Open offlinehacker opened this issue 13 years ago • 18 comments

Implement support for fuse filesystem. This won't be full blown filesystem, but just structure in form vault/filename. We put file metadata in SimpleDB.

Depends on: #25, #26

offlinehacker avatar Aug 28 '12 14:08 offlinehacker

How do you mean FUSE support? By making a new plugin for FUSE?

That might be the case, though I imagine that to make filesystem support via FUSE I first need to add support for SimpleDB so that I could instantly generate directory structure (region/vault/archivefile).

uskudnik avatar Aug 28 '12 14:08 uskudnik

If I imagine correctly, you want to use SimpleDB for storing directory structure. I don't think that dependency on two services would be the right way.

Wouldn't it be more simple to store directory stucture inside vault names. For example dir-subdir-subsubdir-file. Vault names can be 255 chars long. This way you won't be able to store big file names, but you would usually use glacier for backup archives, which shouldn't generate such long names. You could even transcode directory names for support of UTF-8(Glacier allowed characters are a–z, A–Z, 0–9, '_' (underscore), '-' (hyphen), and '.' (period)).

offlinehacker avatar Aug 28 '12 15:08 offlinehacker

Well, at this time I'm storing all the data about the file in SimpleDB (see below).

While I agree that relying on two services might not be the best plan, I think it's the best in-the-cloud solution. While you can save information in vault names, you can only save archive information into archive description, which you would have to either manually cache off-line or wait for 4h to retrieve (at least if I recall correctly, but would have to check, maybe you can get that instantaneously).

Saving archive filenames into vault names would not be a very long term solution :)

ATM I've already implemented half of SimpleDB solution but will give it a bit more thought in the future when I'll have to do rewrite of core calls to boto when glacier support lands in develop branch.

            file_attrs = {
                'region':region,
                'vault':vault,
                'filename':filename,
                'archive_id': archive_id,
                'location':location,
                'description':description,
                'date':'%s' % datetime.datetime.now(),
                'hash':sha256hash
            }

            domain.put_attributes(filename, file_attrs)

uskudnik avatar Aug 28 '12 15:08 uskudnik

Looks like i've missed out that part of code, nice job ;)

offlinehacker avatar Aug 28 '12 15:08 offlinehacker

According to Amazon’s documentation it’s not possible to update an archive’s description:

After you upload an archive, you cannot update its content or its description. The only way you can update the archive content or its description is by deleting the archive and uploading another archive.

But, according to Amazon’s FAQ you can at least retrieve the archive’s description without downloading the whole archive:

You can request a vault inventory as either a JSON or CSV file and will contain details about the archives within your vault including the size, creation date and the archive description (if you provided one during upload).

sonicdoe avatar Aug 28 '12 16:08 sonicdoe

Yes you can request the vault invertory(which i think should be returned fast) using this and then get result using this, but by description, I don't know if that is useful:

About the Vault Inventory

Amazon Glacier prepares an inventory for each vault periodically, every 24 hours. When you initiate a job for a vault inventory, Amazon Glacier returns the last inventory it generated, which is a point-in-time snapshot and not realtime data. You might not find it useful to retrieve a vault inventory for each archive upload. However, suppose you maintain a database on the client-side associating metadata about the archives you upload to Amazon Glacier. Then, you might find the vault inventory useful to reconcile information, as needed, in your database with the actual vault inventory.

What it actually means is that you will have up to 24 hours old information, and I don't think that is good, but at the same time they said, you can preform check with your database and detect possible inconsistencies and repair them.

You still need database, so idea with SimpleDB is very good. If we put data on cloud why not metadata ;)

offlinehacker avatar Aug 28 '12 17:08 offlinehacker

Exactly! :)

Now also inventory-retrieval information is stored in SimpleDB (though integrity check/update not yet) and search work with LIKE operator instead of =.

uskudnik avatar Aug 28 '12 18:08 uskudnik

One could still cache metadata until it's refreshed.

domenkozar avatar Aug 28 '12 19:08 domenkozar

SDB cache is instantly updated (keeping in mind eventual consistency of SDB).

On Aug 28, 2012, at 9:01 PM, Domen Kožar wrote:

One could still cache metadata until it's refreshed.

— Reply to this email directly or view it on GitHub.

uskudnik avatar Aug 28 '12 19:08 uskudnik

Here is one example for fuse, it shouldn't be really hard to implement this, but first we need full support for SimpleDB, includinc sync option. Also core functionality(creating, deleting vaults, uploading downloading archives,.. and bookkeeping) should be put in new class. I know it adds another layer, but it's sane, since we implement additional functionality.

offlinehacker avatar Sep 11 '12 20:09 offlinehacker

No, it's not, but is fuse really core functionality for command line tool?

Although I see the benefits of using same cache and same settings... But I'm not sure whether this makes sense since we are using boto underneath for all the calls anyway and bunch of settings should be read from boto settings anyway?

(By no means am I disregarding the idea, I would just like to check that we won't be mixing in too much stuff).

uskudnik avatar Sep 11 '12 20:09 uskudnik

If we than put fuse here or in another project is not that important, because we have to implement quite some things first.

The problem is boto is not going to implement bookkeeping support, so we should add another layer(another class) anyway. Currently i'm working on #25, and then I will write some tests for newly created class.

offlinehacker avatar Sep 11 '12 20:09 offlinehacker

Sounds reasonable enough to me. Cool, and I'll add cache updating when I get operational.

uskudnik avatar Sep 11 '12 20:09 uskudnik

I'd do this for https://code.google.com/p/s3ql/, all other approachesprojects have smaller potential.

domenkozar avatar Sep 15 '12 13:09 domenkozar

Thanks! It looks like we gonna end with 3 separated projects(glacier-cmd-interface,glacier-sdb-wrapper and fuse intergration), this is going to be fun :D

offlinehacker avatar Sep 15 '12 14:09 offlinehacker

Also read https://groups.google.com/forum/?fromgroups#!topic/s3ql/24GBR0OgTnY

domenkozar avatar Sep 15 '12 14:09 domenkozar

We have to see if our wip model fits their model, or if we can shape it so it will fit.

offlinehacker avatar Sep 15 '12 20:09 offlinehacker

There is hope:

  • http://lwn.net/Articles/548102/
  • http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/commit/?id=01e9d11a3e79035ca5cd89b035435acd4ba61ee1

domenkozar avatar Jun 23 '13 08:06 domenkozar