AliceVision icon indicating copy to clipboard operation
AliceVision copied to clipboard

[RFC] Route I/O accesses through a "virtual" file system (singleton option)

Open p12tic opened this issue 3 years ago • 0 comments

This huge PR is a refactoring that routes I/O accesses through a "virtual" file system. It is currently just a proof of concept to show what changes to the codebase would be needed in one of the options to implement this feature. I'm eager to adjust the PR in whatever way the reviewers deem necessary including full rewrite.

This PR is sibling to https://github.com/alicevision/AliceVision/pull/1191 and implements the singleton option of the design. The "rationale" and "possible options" sections are identical.

This PR currently includes a couple of other PRs that are currently going to develop branch.

Rationale

The rationale for it is as follows. Currently AliceVision is implemented as a set of executables that share data between themselves using the filesystem. This has the following issues:

  • multiple executables are problematic because mobile operating systems severely lock down applications and e.g. iOS only allow an application to have a single process. This means that in order to use AliceVision, the components need to be somehow linked into a single executable.
  • sharing data through the filesystem is inefficient, especially since the amount of uncompressed data is nontrivial and I/O overheads become more significant. If components are linked into a single executable it becomes possible to easily share data using memory.

This PR attempts to tackle the second problem. The design of AliceVision is built around filesystem and changing this is basically impossible. This is worked around by abstracting filesystem access itself. The end result is that AliceVision code does not really know whether they're talking to a real filesystem or not. This makes it possible to route e.g. all accesses to a certain directory to a location in memory thus completely avoiding access to the disk in the cases we care about while not impacting other AliceVision code.

Note that it is not possible to emulate this design by mounting a ramdisk on the filesystem because mobile operating systems don't allow this.

Possible options

In terms of code architecture, filesystem abstraction can be implemented in two ways:

  • multiple filesystem instances passed explicitly to the location of use. The pros of this approach is that it's completely flexible. Part of the code may use filesystem with different settings than the other. E.g. each pipelines can use its own filesystem instance which ensures that after pipeline completes and filesystem is destroyed, all memory is released. The cons is that we need to lug the filesystem instance around to all points of use which impacts a lot of code.
  • a singleton filesystem instance accessed globally. The pros of this approach is that the code needs fewer changes. The cons is that single filesystem instance does not allow efficient cleanup and it becomes possible to accidentally leak memory by not deleting memory-backed files.

This PR

This PR implements the second approach of single global filesystem instance (https://github.com/alicevision/AliceVision/pull/1191 implements the first approach).

The design is relatively simple.

  • There is vfs namespace that contains wrappers for functions and classes available in boost::filesystem namespace.

  • There are two additional types to do the actual IO: vfs::istream and vfs::ostream. They implement std::istream and std::ostream interfaces and effectively mimic std::fstream. vfs::ostream also has several functions to make it easier to port code based on C FILE IO APIs.

The PR currently implements only the part where it acts as just a wrapper around boost::filesystem and real iostream APIs. As a result, the risk of breakage is relatively low. The PR is split into a large number of relatively small, self-contained commits. As a result, if bugs eventually are found, it will be easy to bisect the problems and figure out exact changes causing them.

Porting

Porting of code to use virtual file system is trivial. All uses of functions coming from boost::filesystem should be changed to refer to equivalents from vfs namespace. This is pretty much same as simply doing dumb text replacement of boost::filesystem with vfs. All intances of std::fstream and friends should be replaced with vfs::istream and vfs::ostream as applicable, which is also trivial replacement changing only the type of the variable. It's a little bit harder to port FILE-based code, but vfs::ostream implements equivalents of fprintf and fwrite, so it's also trivial function call replacement without much thinking.

Given that this PR will cause merge conflicts after being merged, I volunteer to rebase and fix any code that this PR affects negatively for several months after the PR is merged.

p12tic avatar Jul 23 '22 18:07 p12tic