Rough sketch: WASI manifests
The following is a very rough sketch for a WASI manifest design. This sketch takes the approach of encoding a manifest in WebAssembly custom sections. One of the goals here is just to get people thinking about what a manifest should be able to do and how it should work.
The basic idea here is to allow WASI-using modules to "request" certain capabilities (aka file descriptors) be pre-opened and provided to them. fd_prestat_get could then provide information about them.
The manifest section
This would be a WebAssembly custom section, which would be sequenced somewhere around the end of a module. For simplicity, this proposal encodes entries in the manifest as a sequence of varuint32 length + UTF-8 content strings. Requests consist of several strings, terminated by the string ".".
Resource requests
A resource request starts with an identifier of the type of resource being requested:
| Name | Description |
|---|---|
directory |
a directory, which may contain files, directories, or other things, preopened |
socket |
openable sockets (network or otherwise, stream or otherwise), not pre-opened |
directory resources
For directory resources, the next string is the logical name, which is a UTF-8 string. Note that this name is just a key, rather than being an actual file path. Mapping the key to a path is implementation-specific and may require input from the end user. There are no significant path separators, such as '/', in this name.
The following names are recognized and implementations may choose to streamline the user experience for these paths:
| Name |
|---|
Desktop |
Documents |
Downloads |
Music |
Pictures |
Videos |
This may be optionally followed by additional attribute strings:
| Name | Description |
|---|---|
list |
can get a listing of this directory's contents (aka readdir) |
write |
can create or rename files |
write depends on list.
Examples:
| Resource request | Meaning |
|---|---|
directory|Pictures|list |
can list and read the contents of the user's Pictures folder |
directory|logs|write |
can write to log files (but not see the names of existing log files) |
socket resources
For socket resources, the next string is the name, which is a UTF-8 string, interpreted as a key, in the same way as directory resource names.
The next string specifies the domain, which is one of the following:
| Name | Description |
|---|---|
ip |
IPv4 or IPv6 |
The next string specifies the type, which is one of the following:
| Name | Description |
|---|---|
stream |
connection-based socket |
datagram |
connectionless socket |
This is followed by a mode, which is one of the following:
| Name | Description |
|---|---|
listen |
listen for incoming connections or packets |
connect |
initiate a connection (for stream) or set a default destination (for datagram); see below for details |
listen sockets
For listen socket resources, the next string is the arity, which is one of the following:
| Name | Description |
|---|---|
single |
exactly one address will be provided by the implementation |
multiple |
the implementation may provide multiple addresses |
This is optionally followed by a port suggestion, which for ip sockets is : followed by either a number, an IANA port name, or "*".
Examples:
| Resource Request | Meaning |
|---|---|
socket|ip|datagram|listen:ntp |
Request a socket address that may be listened on for network datagrams, and suggest port 123 |
socket|ip|stream|listen:8080 |
Request a socket address that may be listened on for network streams, and suggest port 8080 |
socket|ip|stream|listen:* |
Request a socket address that may be listened on for network streams, and suggest allowing any port |
connect sockets
For connect socket resources, an optional destination suggestion may follow.
Destination description for ip sockets starts with an address set, which is either CIDR notation or a domain name in which components may be replaced by "*" to indicate that any name at that level is to be permitted. For IPV6 addresses, the CIDR notation is enclosed in brackets ("[" and "]"). It is followed by ":" and either a port number, an IANA port name, or "*".
Examples:
| Resource Request | Meaning |
|---|---|
socket|ip|stream|connect|*.example.com:20 |
Suggest allowing connecting to ftp ports on hosts under example.com |
socket|ip|stream|connect|[2001:4860:4860::8888/125]:80 |
IPv6 CIDR notation with brackets, single port |
socket|ip|datagram|connect|10.0.0.0/24:* |
Suggest allowing sending datagrams to any port on addresses in 10.0.0.* |
Note that WASI does not currently support sendto, so datagram sockets can't send packets to non-default addresses.
To make all this work:
The networking features here assume the addition of functions to create/bind/listen/connect sockets.
Hi! I landed here randomly and am probably missing a lot of context, but I have spent a lot of time working on permissions models for capability systems (e.g. for sandstorm.io).
It seems like a lot of systems that have tried to put permissions requests into "manifests" to be fulfilled at install or startup time have not worked out very well. E.g. Android has been slowly and painfully moving away from install-time permissions towards dynamic permissions requests made at runtime.
Are there any plans to support a dynamic permissions (aka "powerbox") model in WASI?
For example: Instead of having the app request permissions to the Pictures folder upfront, could the app dynamically say "I need a picture" (or "I need a folder to store pictures") and have the system display a picker to the user where they can choose which picture (or folder) to use? And then, when the user chooses something, a capability for their choice is returned to the app?
Again sorry if I totally missed the broader context here, I haven't had the chance to look closely at WASI (though I'm excited that it is capability-based).
@kentonv This is great feedback! I myself have not designed a manifest system like this before, so it's great to hear from people who have.
I think WASI will be used both in interactive and non-interactive use cases, and you're likely right, that the static manifest concept may not be well-suited for interactive use cases. So Music, Pictures, etc. aren't great examples. But I do think it's worth thinking about this concept for non-interactive use cases.
Powerbox APIs sound like a good idea, and should work really well within WASI's capabilities system. I don't think a hypothetical powerbox_open is a replacement for path_open; it's just a different function for a different purpose. And with a modular API, we can allow runtimes that don't support interactivity to omit these functions.
For non-interactive use cases, I'd argue that almost every kind of external resource that an app might need to access is also something that you'd want to be configurable at deploy time.
For example, a server app may need access to a database. You probably want to be able to deploy the exact same wasm file to a staging environment, followed by a production environment, using different databases.
So, this suggests that the "manifest" (which sounds like it is specified at compile time) should only list the abstract capabilities that the app needs, but should not specify exactly what to bind them to. It could say that it needs a capability to make TCP connections to the database, but won't specify the exact address. When the app is deployed, the operator makes the choice of what to bind these to. Maybe they do this through a powerbox-like UI, or maybe through a config file, or maybe even though command-line arguments that are parsed and turned into capabilities by the WASI runtime before starting the app.
In some cases it might be useful for the app to provide suggested bindings (e.g. suggest the database bind to db.example.com:1234) but I think the operator should always be able to override these. E.g. for testing purposes you always want to be able to do dependency injection.
This manifest seems to me to be a solution in search of a problem.
-
Some apps will need dynamic policies. For them, a manifest is too restrictive.
-
Other apps will want many of these parameters to be runtime configuration.
-
Engines will likely need to define their own custom resource types. This makes manifests an implementation detail.
-
Description of things like sockets gets complex very quickly, especially when adding native encryption support. I don't think we can cover all the needed options.
This sounds like a wasi-level imports section. We should model it after that - names and types, allowing the binder to decide what goes in each box. For example, on command line we could link the imports to flags.
I've done some more reflection on this topic. Let me sketch what I think is the likely deployment scenarios.
In interactive applications, dynamic permissions are much preferred. Android's mistake here is well known.
In locally run daemons or services, the purpose of such a manifest is to protect the application by minimizing the privileges granted to it at runtime (think: SELinux). The purpose of this is being defensive against arbitrary code execution attacks. But I don't think the manifest can actually protect us against that. So I don't see value here. I'm open to being wrong.
In cloud applications (think FaaS), we generally have three entities: an execution node, a scheduler and the binary. The binary needs to communicate its requirements to the scheduler; a manifest could be helpful here. Likewise, the node needs to communicate its available runtime policies to the scheduler. If the manifest format could also communicate these, then great. The scheduler has to determine that the node's policies are a superset of the binary's requirements. A manifest can work well in this scenario. My concern is that it is highly context specific.
Are there any direct use cases for the manifest in mind? If not, can we define them and see if a manifest is really the best solution?
Just to be clear, "permissions" in web assembly are simply defined as "allowed access to certain functions with certain arguments", correct? I don't see anything else in the security section detailing further requirements.
Edit: there could also be "rights" associated with files, i.e. fd_fstat_set_rights and _wasi_rights_t
If such is the case, shouldn't we simply have a corresponding <func>_hp(*<func args>) -> bool (or similar) for every function where a permission may be required, where hp stands for "has permission"? This would allow a module to check for various permissions in (for example) the module start function, or any other module-defined function, and then report them in a way the importer/caller can analyze and determine if there are issues.
This solves multiple problems:
- It is extremely simple (from an implementation point of view)
- It enables dynamic permissions
- It enables modules to report all missing permissions, rather than finding missing permissions one at a time.
Disadvantages:
- each module must check permissions of it's dependencies (recursively) and know how to report them.
Also: I think that static permissions are more the domain of build systems / package managers to verify at compile time that your permissions make some amount of sense. I don't really think they belong in the wasi standard.
I just realized: my proposal has no way for specifying what permissions a dependency is allowed to access... I assume dependency-by-dependency permissions are controlled by the wasi interpreter, not the "importer" correct?