pyfilesystem2
pyfilesystem2 copied to clipboard
Support for a FileSystem using SQL DB as a backend
I've been using pyfilesystem2 a lot in my API projects, and it has been great. Handles so many cases that are useful.
One of the things I have been thinking about is whether it is possible to support a RDBMS as a backend. This may not be ideal but has some benefits ...
Requirement:
In some of my projects, I need a reliable storage mechanism that is available across multiple machine (as I have multiple API servers to enable HA + Fault Tolerant setups).
Current solutions
I currently have to get a NFS or S3 or SSHFS kind of storage in such places, which is what I use right now.
- With AWS, S3 still has some latency which is not ideal
- SSH FS does not give me the high availability I need
- NFS is doable, but in some of my deployments it's just an additional infrastructure effort to get these procured + setup (with non-cloud environments)
Proposed solution
Now, in majority of API servers, we already have a RDBMS where we store information. And for small files, this is a pretty decent approach for quick access and change Ref: https://www.microsoft.com/en-us/research/publication/to-blob-or-not-to-blob-large-object-storage-in-a-database-or-a-filesystem/ This could use sqlalchemy as a dependency so that there is abstraction on the exact database nuances.
Note: This could also be a 3rd party filesystem I may create when I get some free time too. But thought I'd post it here in case there is interest from other people involved in this project, or if others have some workarounds already available
Interesting idea... but as pyfilesystem provides a "full" filesystem API, my gut feeling is that this kind of approach would have a lot of corner-cases, and it'll be a lot of work to sort out all these corner cases. (Fortunately, pyfilesystem also includes a reasonably comprehensive test-suite). So good luck if you do decide to tackle this :smiley:
I think the idea has been floated before. Not sure if any work has been done on that.
It's certainly doable. But @lurch is correct, you can probably get something up and running surprisingly quickly, but it will take longer to sort out the details and make it behave identically to other filesystems.
You might be interested in dCache (but beware the AGPL license).
I second this idea. Something like that would be useful to me as well.
"It's certainly doable. But @lurch is correct, you can probably get something up and running surprisingly quickly, but it will take longer to sort out the details and make it behave identically to other filesystems."
Most people probably don't need the details (personally - I don't, for me only reading and writing files would be enough). I don't know how PyFileSystem works exactly, so I don't know if it's possible to support this without the details.