ipfs-docs icon indicating copy to clipboard operation
ipfs-docs copied to clipboard

Explain the relationship between MFS, IPFS, and UnixFS.

Open hsanjuan opened this issue 5 years ago • 15 comments

Tracking a user request at https://github.com/ipfs/go-ipfs/issues/7084#issuecomment-608552812

URL of the page in question:

Maybe: https://docs.ipfs.io/guides/concepts/mfs/

What's wrong with this page?

See confusion in above thread.

Related: how to manually create and modify unixfs directories (it has come up several times in the last week).

hsanjuan avatar Apr 03 '20 22:04 hsanjuan

The doc says:

Because files in IPFS are content-addressed and immutable, they can be complicated to edit. Mutable File System (MFS) is a tool built into IPFS that lets you treat files like you would a normal name-based filesystem — you can add, remove, move, and edit MFS files and have all the work of updating links and hashes taken care of for you.

So, apparently, I can run things like:

$ ipfs files mkdir /foo
$ ipfs files ls -l /foo
$ echo hello | ipfs files write --create /foo/world
$ ipfs files ls -l /foo # hash has changed
$ ipfs files read /foo/world

That's nice. Somewhere, there is a mapping from path names to hashes. The documentation says nothing about how this mapping is established or where it is stored.

My first thought was that ipfs files maintains a local key/value store that eventually gets turned into an mDAG. But apparently ipfs files refers to a rooted tree stored under the hash one gets from ipfs files stat /. Apparently, this root node can be changed with ipfs files chcid. None of that is obvious from the (minimal) documentation.

My suggestion would be to update the documentation to something like this:

"The ipfs files commands allows a Merkle DAG rooted at a given CID to be manipulated somewhat like a file system. Subcommands of ipfs files mirror common UNIX commands. While traditional UNIX commands modify stateful data structures, their ipfs files equivalents simply return a new root CID that reflects the altered file system state; the original state is still available under the original CID. The current root is stored ___. Note that concurrent usage is/is not possible due to serializing operations/race conditions (pick one).

The ipfs files commands are a simple set of convenience functions that keep track of file system state by keeping track of the root CID; but the Merkle DAG they generate under to root CID is identical to one created with the basic commands.

Paths used by ipfs files are indistinguishable from regular UNIX paths, so you cannot tell by looking at a path whether it refers to a file in the UNIX namespace or a file in the ipfs files namespace. The paths used by ipfs files are also different from, and incompatible with, IPFS paths mounted on /ipfs or /ipns; paths in /ipfs always refer to CIDs, and paths in /ipns always refer to entities where name resolution has been set up, while ipfs files paths always refer to whatever CID root is currently in effect."

This probably needs more a lot more elaboration and clarity, but at least it identifies some points that need addressing.

I think one issue is that the semantics of ipfs files and the choice to have four different namespaces that can't be distinguished syntactically lead to the complexity of the documentation. The best way of making the documentation better and create a better user experience might be to change the commands themselves.

For example, consider this:

ipfs files provides a set of commands similar to basic POSIX file system commands (cp, rm, ls, ln, mount, etc.). These operate on the regular file system with IPFS subtrees mounted on particular locations. Such mounts can be established with ipfs files mount <cid> <path>. The current mount table can be returned with ipfs files mount [--format=json], and the CID corresponding to a particular path can be returned with ipfs files cidof <path>. The mount table is kept in a database at ~/.ipfs/mounts. The ipfs files operations lock any database record that are modified by an operation (e.g., the mount that includes the destination of a cp operation, but not the source mount), so that concurrent accesses from multiple scripts are serialized. The database location can be overridden with the IPFS_FILES_MOUNTS environment variable. For the construction of large directory trees with IPFS, consider using the ipfs maketree <json> command instead, which parallelizes the construction of the Merkle DAG.

ipfs maketree <json> takes a JSON description of a directory tree. The JSON file contains mappings of the form {filepath: ..., destpath: ...} and {entries: [...]} for files and directories. The ipfs maketree command will copy over all files to a destination tree and return the CID for the destination tree. Instead of filepath:, users can also specify cid: (incorporated directly), linkpath: (only CIDs are added, but the original file is used as the underlying storage), and linkurl: (only CIDs are added, and the data at the given URL is used as the underlying storage).

tmbdev avatar Apr 16 '20 05:04 tmbdev

Two quick notes (agreeing on most otherwise):

Apparently, this root node can be changed with ipfs files chcid.

According to the cli docs, this only changes the CID version or hash function of the root node of a given path.

The documentation says nothing about how this mapping is established or where it is stored.

I would be against providing imlementation details in user documentation. How that happens does not (or should not) affect the usage of the feature.

hsanjuan avatar Apr 16 '20 08:04 hsanjuan

would be against providing imlementation details in user documentation. How that happens does not (or should not) affect the usage of the feature.

I'd say where the state is stored is both user-visible and affects the usage. For example, if it's stored in the file system, it is subject to backup, restore, version control, concurrent access, storage on network file systems etc. (i.e., after a local file system restore, a directory on IPFS created with these tools would seem to revert as well, an unexpected behavior for a distributed file system).

tmbdev avatar Apr 18 '20 18:04 tmbdev

There is no difference between that or any ipfs related data, everything is in ~/.ipfs.

(i.e., after a local file system restore, a directory on IPFS created with these tools would seem to revert as well, an unexpected behavior for a distributed file system).

Did you get the idea that somehow MFS is backed up for you somewhere else so that it would keep state even if you reverted your IPFS repo?

hsanjuan avatar Apr 20 '20 08:04 hsanjuan

There is no difference between that or any ipfs related data, everything is in ~/.ipfs.

MFS could store state in memory/a daemon, in the current working directory, or in IPNS, all places IPFS also already stores state. All of those could be reasonable choices for something like MFS, depending on which use cases you have in mind.

tmbdev avatar Apr 23 '20 17:04 tmbdev

@tmbdev thanks, I see how being more detailed there can help users.

@johnnymatthews I think we have enough feedback to actually write a good guide on MFS.

Do you think this should still be under concepts/MFS, or a different content location? I volunteer myself to write it.

hsanjuan avatar Apr 28 '20 14:04 hsanjuan

Yeah /concepts/mfs works well for the URL. The title of the page should be Mutable File-Systems (MFS), as should the sidebar nav item.

johnnymatthews avatar Apr 28 '20 19:04 johnnymatthews

@hsanjuan do you still have the bandwidth to write this doc, or should I open it up for a bounty?

johnnymatthews avatar Jul 16 '20 19:07 johnnymatthews

@hsanjuan does the update MFS document explain this issue better, the distinction between MFS & UnixFS?

realChainLife avatar Aug 16 '20 14:08 realChainLife

@johnnymatthews there has been a big update to this file (c5ed3ecc1e4d46b0e59a9c0718b239d3d432cca4). It has added examples using javascript (which I'm not sure are helpful at all as I would expect go's CLI examples). It still fails to explain how anything works in regards to MFS, while it does a better job with UnixFS.

If I were to work on this, I'd throw most of the MFS subsection away, keeping some small CLI example or two at the end for the usual workflow.

hsanjuan avatar Aug 25 '20 17:08 hsanjuan

Correct, c5ed3ec was a bounty project completed by @realChainLife (in this thread). Using Go-IPFS or JS-IPFS is a debate for another time, but having any examples at all is a good thing. I'm assuming you no longer have time to write the changes you'd prefer to see on the page. Would you be able give list a few bullet points for what you'd like to see though?

johnnymatthews avatar Aug 26 '20 14:08 johnnymatthews

@johnnymatthews I could try to complete this issue if this makes sense?

alexmmueller avatar Apr 30 '21 07:04 alexmmueller

Assigned to you @alexmmueller! Drop any questions you've got in here.

johnnymatthews avatar May 05 '21 18:05 johnnymatthews

Are you still working on this one @alexmmueller?

johnnymatthews avatar Sep 14 '21 13:09 johnnymatthews

Re: Current docs about "File systems and IPFS": MFS and UnixFS

My profile: Someone new to web3 and IPFs

Reading the docs

The docs are doing a good job of explaining MFS and UnixFS respectively. I got a vague understanding, that MFS is the "file sytem", that uses the "format" UnixFS. [U1]

However, there was still this uncertain feeling of my understanding [U1] being actually correct, due to these 2 passages in the docs: (Both describe how they handle linking for me)

MFS: https://docs.ipfs.io/concepts/file-systems/#mutable-file-system-mfs

MFS files and have all the work of updating links and hashes taken care of for you.

UnixFS: https://docs.ipfs.io/concepts/file-systems/#unix-file-system-unixfs

[...] so it needs metadata to link all its blocks together. UnixFS is a protocol-buffers (opens new window)-based format [...]

Suddenly I wondered: "If both handle linking, are they actually both describing a "file system, thus are they alternatives?" (yes, I'm aware, that the definition of UnixFS say it is a format, but the "FS" in its name didn't help to make this separation clear)

Wish to reduce confusion

It would have been helpful for me, to make this "hierarchy" between MFS and UnixFS clearer with a simple diagram, like

#251

     +---------+
     |   MFS   |   File system.
     +---------+
     |  UnixFS |   Files and directories.
     +---------+
     |   DAG   |   Nodes.
     +---------+
     |  Block  |   Stream of bits.
     +---------+

or a sentence like

https://github.com/ipfs/go-ipfs/issues/5051#issuecomment-393453908

[...] Unixfs is a format. Mfs is the virtual filesystem tree, and the files api is an api interface that gives you filesystem operations over unixfs files/directories backed by mfs.

Final words

Generally, the docs are well structured and written. I like all the references and links to tutorials you are giving. This is just a tiny nitpick!

tamagosante avatar Jan 28 '22 11:01 tamagosante