AliceVision icon indicating copy to clipboard operation
AliceVision copied to clipboard

[RFC] Route I/O accesses through a "virtual" file system (multiple-instance option)

Open p12tic opened this issue 3 years ago • 0 comments
trafficstars

This huge PR is a refactoring that routes I/O accesses through a "virtual" file system. It is currently just a proof of concept to show what changes to the codebase would be needed in one of the options to implement this feature. I'm eager to adjust the PR in whatever way the reviewers deem necessary including full rewrite.

This PR is sibling to https://github.com/alicevision/AliceVision/pull/1194 and implements the singleton option of the design. The "rationale" and "possible options" sections are identical.

Rationale

The rationale for it is as follows. Currently AliceVision is implemented as a set of executables that share data between themselves using the filesystem. This has the following issues:

  • multiple executables are problematic because mobile operating systems severely lock down applications and e.g. iOS only allow an application to have a single process. This means that in order to use AliceVision, the components need to be somehow linked into a single executable.
  • sharing data through the filesystem is inefficient, especially since the amount of uncompressed data is nontrivial and I/O overheads become more significant. If components are linked into a single executable it becomes possible to easily share data using memory.

This PR attempts to tackle the second problem. The design of AliceVision is built around filesystem and changing this is basically impossible. This is worked around by abstracting filesystem access itself. The end result is that AliceVision code does not really know whether they're talking to a real filesystem or not. This makes it possible to route e.g. all accesses to a certain directory to a location in memory thus completely avoiding access to the disk in the cases we care about while not impacting other AliceVision code.

Note that it is not possible to emulate this design by mounting a ramdisk on the filesystem because mobile operating systems don't allow this.

Possible options

In terms of code architecture, filesystem abstraction can be implemented in two ways:

  • multiple filesystem instances passed explicitly to the location of use. The pros of this approach is that it's completely flexible. Part of the code may use filesystem with different settings than the other. E.g. each pipelines can use its own filesystem instance which ensures that after pipeline completes and filesystem is destroyed, all memory is released. The cons is that we need to lug the filesystem instance around to all points of use which impacts a lot of code.
  • a singleton filesystem instance accessed globally. The pros of this approach is that the code needs fewer changes. The cons is that single filesystem instance does not allow efficient cleanup and it becomes possible to accidentally leak memory by not deleting memory-backed files.

This PR

This PR implements the first approach of multiple filesystem instances (https://github.com/alicevision/AliceVision/pull/1194 implements the second approach).

The design is as follows:

There is vfs::filesystem type. It contains pretty much all functions that are found in boost::filesystem namespace. Creating a directory is thus done as follows:

// fs is an instance of vfs::filesystem
fs.create_directory(path);

There also are vfs::path, vfs::directory_iterator, vfs::directory_entry and similar types. They work pretty much like their boost equivalents.

It is expected that vfs::filesystem instances are only created at the top level of the call stack - in the main() or at the beginning of aliceVision_main functions. As a result, many functions accept vfs::filesystem& fs instances as an additional argument. A non-const reference has been chosen as the best option out of all (reasoning here [1]).

There are two additional types to do the actual IO: vfs::istream and vfs::ostream. They implement std::istream and std::ostream interfaces and effectively mimic std::fstream. vfs::ostream also has several functions to make it easier to port code based on C FILE IO APIs.

The PR currently implements only the part where it acts as just a wrapper around boost::filesystem and real iostream APIs. As a result, the risk of breakage is relatively low. The PR is split into a large number of relatively small, self-contained commits. As a result, if bugs eventually are found, it will be easy to bisect the problems and figure out exact changes causing them.

Porting

Porting of code to use virtual file system is relatively simple, but quite involved. All uses of functions coming from boost::filesystem should be changed to refer to equivalents from vfs::filesystem class. This means that an instance of vfs::filesystem should be somehow passed throughout the entire call stack. Many functions will get an additional parameter and some classes will get an additional member which increases complexity of the code.

All intances of std::fstream and friends should be replaced with vfs::istream and vfs::ostream as applicable, which is also trivial replacement changing only the type of the variable. It's a little bit harder to port FILE-based code, but vfs::ostream implements equivalents of fprintf and fwrite, so it's also trivial function call replacement without much thinking.

Given that this PR will cause merge conflicts after being merged, I volunteer to rebase any code that this PR affects negatively for several months after the PR is merged.


[1] Passing vfs::filesystem by value would mean that the underlying data is managed by either shared ownership or a hidden member reference. The former introduces performance impact at every call site, the latter is effectively the same as pass by reference in terms of memory safety. Passing vfs::filesystem as const reference has been decided against due to that const is additional noise, does not represent the actual semantics and has the same memory safety as regular reference. Since the intention is that vfs::filesystem outlives full AliceVision pipeline, the chances of dangling references are likely small.

p12tic avatar Jul 22 '22 01:07 p12tic