wasi-libc icon indicating copy to clipboard operation
wasi-libc copied to clipboard

Allow mk*temp on WASI

Open RReverser opened this issue 4 years ago • 17 comments

Currently, the corresponding code for mkstemp, mkostemp and mkdtemp is commented out, so any C code relying on those functions fails to compile: https://github.com/WebAssembly/wasi-libc/blob/5ccfab77b097a5d0184f91184952158aa5904c8d/libc-top-half/musl/include/stdlib.h#L116-L120

The comment says it's done because WASI doesn't have temporary directories, however (correct me if I'm wrong), unlike tmpfile, those methods don't need the concept of a system-wide temporary directory.

They accept an explicit filename template, and just generate a unique filename based on it. As such, there's no reason they shouldn't work on WASI.

For example, it should be perfectly possible to create temporary files using template like /mounted-folder/tmp-XXXXXX where /mounted-folder is a valid preopen dir.

RReverser avatar Jan 29 '21 23:01 RReverser

Even though mkstemp etc. don't hard-code a temporary directory, common usage of them does hard-code a path, often /tmp. If we provide this feature, programs using may implicitly depend on being passed a preopen dir for /tmp (or /) in the common case. Can you say more about how you'd expect to use this feature?

sunfishcode avatar Jan 30 '21 05:01 sunfishcode

@sunfishcode I'm personally okay with programs using a hardcoded /tmp/... path (and that's the case for program I'm trying to compile) as I can easily map such path to a custom temporary dir via WASI runtime.

RReverser avatar Jan 30 '21 13:01 RReverser

I'm looking to balance the desires of people using WASI today, with the desire to avoid building up a compatibility burden that may delay or even prevent the really cool things that WASI should enable in the future.

We had a presentation at the last WASI meeting on resources and handles. In future meetings, we'll be exploring how to use those to let programs describe the kinds of resources they need, so if they want the ability to create temporary files, they ask for it. Making programs be more explicit about the permissions they need, instead of just assuming they have implicit access to everything, is a big part of what will enable WASI programs to run in interesting new kinds of computing environments, and to allow WASI APIs to be virtualized. Once we have things like this, we can look at C APIs like mkstemp and the use cases that need them and figure out how best to use our new toolset to solve them.

At a glance, the Unix concept of storing temporary files in an all-users-readable-and-writeable directory has not aged well. Even if we use mkstemp which mostly solves the problem of creating such files safely, there are several problems. Programs can still be exposed to other users on the system if they aren't careful. Programs can leak temporary files if they're killed before they can clean up after themselves. This latter problem has led systems to have daemons that periodically remove files in /tmp, however those daemons don't prevent /tmp from filling up, and they have the side effect of introducing race conditions where a program's temporary files can be removed while the program is still running and expecting them to be available. And in WASI, there's also a concern about programs using files in /tmp as a side channel to communicate with other programs.

There are ways to mitigate some of these problems, but some of the mitigations break some real-world mkstemp use cases. Before we as an ecosystem take on support for these use cases, I'd like us to think through what we're committing to.

One alternative we should consider would be an API that looks more like Linux's memfd_create. This isn't exactly mkstemp because it doesn't provide programs with the name of the file, which some mkstemp use cases need. However, such an API has a much smaller footprint on the outside environment, so it's worth investigating whether that would be sufficient.

sunfishcode avatar Jan 30 '21 16:01 sunfishcode

This latter problem has led systems to have daemons that periodically remove files in /tmp, however those daemons don't prevent /tmp from filling up

The mkstemp is requiring code to explicitly remove the file once they're done with it, instead of relying on it being auto-removed at some later point like you would in some other APIs.

I don't see why mkstemp is any different from any lower-level I/O APIs for reading / writing files. The only difference between it and normal file creation is the convenient random name generation, but other than that it's working at the same abstraction level.

I know I'm not involved in WASI decision making, but my personal stance is that, as long as some API itself doesn't violate existing security boundaries, it shouldn't be WASI's job to decide what developers might want, but instead to ensure portability for existing software where possible, and let end user of the WASI runtime provide configs to make those apps work with security boundaries in mind.

E.g. program that tries to create files in /tmp won't work by default anyway, until user maps that endpoint to some reasonable dir of choice according to their constraints.

In this regard, mkstemp doesn't violate any such boundaries, and operates on the same namespace as any other file reading / creation APIs, so I don't see why it would be treated specially and not allowed to work.

RReverser avatar Jan 30 '21 16:01 RReverser

It is the application's job to clean up after itself, but if applications actually did this job, tmp cleaner daemons / cron jobs wouldn't be a thing, and they are a thing. And the situation will be worse on WASI, where we probably won't even have signal handlers, which is how many applications attempt to clean up after themselves.

Temporary files differ from normal files in that normal file names are almost always derived from user inputs. If cc writes to t.o, the user can be expected to know that and to clean up t.o when they don't need it anymore. But if cc writes to /tmp/adsf1324.s to pass it to as and is abruptly terminated, it will leak the file, and neither the user nor the system is in a position to clean it up promptly.

Compatibility with existing code is important. I've personally put a lot of effort into this, and will continue to do so. But thinking about how WASI interacts with the outside world is also important if we want WASI to grow into new areas. "Files" encompass several different distinct use cases, from persistent storage, to temporary buffers passed between applications, to software build artifacts which are somewhat transient but live longer than the programs that create them, to private storage used internally to an application. The security implications are different for each.

Would an API like memfd_create be sufficient for your use case? I recognize that you'll likely need to modify some code to use it, but I'm curious if that's something feasible for you? If not, the reasons why it's not may also be interesting, and may help us determine what kind of functionality we ultimately need.

sunfishcode avatar Jan 30 '21 18:01 sunfishcode

This sounds to my like it is not really a WASI question at all. WASI provides a non-POSIX capability-based filesystem API. wasi-libc provides emulation of a more traditional POSIX filesytem on top of this. This is one of the primary purposes of wasi-libc, right, to provide a compat layer? It sounds like adding mkstemp to wasi-libc (or rather not excluding it) is purely a userspace decision about which libc utility functions to expose. Given that we already emulate a filesytem I see no reason no to provide such utility functions on top of it. I think the same argument can be made for any purely userspace piece of libc code.

If an embedded wants to map /tmp to an ephemeral memfs of something that has the same lifetime as the instance that is up to the embedder. If the embedder want to map /tmp to some more permanent place that is likewise up to the embedder. That is true to today regardless of whether we include these helper utility functions in wasi-libc or not. I think there are plenty of userspace programs that will use /tmp regardless of whether mkstemp is available.

If we want to exclude libc functions based on whether they are outdated, or insecure, or encourage bad practice we won't have much left at the end of the day.

sbc100 avatar Jan 30 '21 18:01 sbc100

To be sure, WASI libc includes functions like strcat even though one could fill multiple blog posts with its problems. We're not just taking things away gratuitously.

The things WASI libc currently excludes are all related to WASI's relationship with the outside world. These are the kinds of things that affect what environments WASI can run in, what its security stance is, how WASI programs can be connected to other WASI programs, and so on.

It's tempting to "leave it to the embedder". But for things that affect whether programs work or not, this treats WASI more like a framework from which ABIs can be built, rather than an ABI in its own right. Customizing WASI into your own ABI may be what's most convenient for your needs in the short term, but it undermines WASI's potential.

Temporary files are just filesystem calls, but /proc and /dev are also just filesystem calls. Ultimately, just because Unix traditionally unified things under the same syscalls doesn't mean they have the same impact on portability, security, and virtualizability.

To be sure, I'm not saying we can't ever do temporary files or even mkstemp. I'm saying that even though Unix traditionally treats this like normal files, they nonetheless have security considerations, and we should think carefully about how we want it to work before we tie our hands with compatibility.

sunfishcode avatar Jan 30 '21 19:01 sunfishcode

I cannot see posix shortcuts like this being good for wasi as it is not intended to replicate posix. Not would I expect anyone to imagine ported code should just work at all. If it does because the effort stated with posix to get things going, that's nice. But the objective should not be to enable porting mistaken code (in this case, code that fails to cleanup) simply because it's easier to port. Sure, I'm selfish about my time like anyone, but you do not build systems to last based on current ease of use. Posix has precisely that objective; wasi doesn't, is the way I see it. Of course, we are all free to look at the tradeoff differently.

Of course, wasi should enable using them easily enough, which was a good portion of the discussion this past week (as @sunfishcode mentioned).

It would be interesting to hear a good argument for the objective of enabling mktemp using a specific example, however. (The abstract idea of me not having to rewrite code for this one item isn't persuasive to me.)

I could totally be persuaded if I saw a concrete example that was difficult to enable without substantial time investment.

squillace avatar Jan 30 '21 20:01 squillace

Temporary files are just filesystem calls, but /proc and /dev are also just filesystem calls.

@sunfishcode I feel like you're mixing up special Unix namespaces with a generic filesystem API that isn't tied to any of those.

There is nothing special about /tmp aside from convention.

Moreover, even though it's used as an example above, it's not even a requirement for mkstemp. mkstemp can and often is just as well used for creating files in the current working dir or its subfolder, and it seems strange to take that away when it doesn't do anything different than fopen + rand does, but in a single convenience helper.

I cannot see posix shortcuts like this being good for wasi as it is not intended to replicate posix.

@squillace As @sbc100 said, this is not a WASI question, but wasi-libc. And wasi-libc is intended to replicate POSIX-like layer on top of WASI where possible.

RReverser avatar Jan 30 '21 20:01 RReverser

There is nothing special about /tmp aside from convention.

... in Unix.

But, Unix isn't the only goal. We want to do more with WASI than just run Unix programs in Unix environments.

So, we need to allow ourselves to think about the actual impacts on security and virtualizability, rather than limit ourselves to how Unix thinks about them.

/proc and /dev are examples of how "leave it to the embedder" and "it's just WASI APIs" aren't automatically inert with respect to WASI's goals.

Temporary files are different in terms of how Unix implements them, but also aren't automatically inert with respect to WASI's goals. Once you start talking about building programs that implicitly expect to be passed capabilities for things that aren't related to user paths, and which they may expect to use to pass data to other programs, and which they could potentially use to surreptitiously pass data to other programs, and which they may expect to have outlive the programs that create them, you're talking about new surface area, even if Unix doesn't think so.

I'm not saying "no mkstemp ever". I'm saying, we need to think about temporary files in the context of the new tools that WebAssembly is giving us. It also helps us to look at use cases in more detail than just "existing code wants this" because while there is always a place for maximal compatibility tools, potentially including mkstemp, we may be able to provide better solutions for users that only need a subset of the functionality or are willing to do some amount of porting work.

sunfishcode avatar Jan 30 '21 22:01 sunfishcode

@RReverser I get it (the wasi-libc distinction). My viewpoint is that "POSIX-like" is the phrase that's doing too much work here as you (originally at least) expressed it. I, too, am not opposed to no mkstemp ever, but I also do want to think hard about which elements are brought in and how close we get to posixisms that enable people to unthinkingly wander down the wrong wasi path. I should say also that your arguments are not unreasonable prima facie but they assume that we should bring in code that makes tons of assumptions as-is. In particular, as @sunfishcode says:

and which they may expect to use to pass data to other programs, and which they could potentially use to surreptitiously pass data to other programs, and which they may expect to have outlive the programs that create them, you're talking about new surface area, even if Unix doesn't think so.

Doing so would definitely have the effect of enabling rapid movement of code to wasi, it's true, but immediately that code would be assuming a full posix environment that brings tons of potential side effects. Worth it? Maybe it is! So it's a good issue to think about. I'm of the position currently that it is not.

squillace avatar Jan 30 '21 23:01 squillace

... in Unix.

Why just Unix? Or, more generally, why do we keep going back to discussing folders in the filesystem - whether /tmp, /proc, or something else - when really it's just a filesystem API like any other that accepts a full path and doesn't care about the destination.

It doesn't have any additional semantics, nor relies on POSIX filesystem mount points, it just creates files with randomised names under given path - which is perfectly portable.

It doesn't restrict where such files are created, nor does it attach any new "temporary file" semantics to files created via this API - they are just regular file handles, like any other, so discussing semantics of /tmp or temporary files seems completely out of scope of the API.

RReverser avatar Jan 31 '21 01:01 RReverser

and which they may expect to have outlive the programs that create them

About this in particular: that would be perfectly fine, and, in fact, expected on any other system.

I don't understand where [in the scope of this discussion] the expectation of "magical clean-up" of files created via this API is coming from, because it's not the case on any of the systems it's available on. The code must explicitly call unlink if they want to delete such files.

To reiterate: it's literally doing what you can do via fopen, no more, no less, just with randomized filename. No new semantics attached to either the paths or the file descriptors.

RReverser avatar Jan 31 '21 01:01 RReverser

Doing so would definitely have the effect of enabling rapid movement of code to wasi, it's true, but immediately that code would be assuming a full posix environment that brings tons of potential side effects.

For this I also don't see how mkstemp is any different or adds any new risk compared to any other filesystem APIs :/ We're already providing POSIX-like libc, in a limited way this expectation already exists.

Sure, it doesn't make sense to expose APIs that don't map to WASI concepts, but for those that do, it doesn't make sense to hide them either.

RReverser avatar Jan 31 '21 01:01 RReverser

It's not about the fopen; it's about the capability needed to make the main use case work.

sunfishcode avatar Feb 01 '21 15:02 sunfishcode

it's about the capability needed to make the main use case work

That's why I said: "No new semantics attached to either the paths or the file descriptors.". I don't understand what capability is that, as, again, it's just a generic file creation API.

If you're talking about /tmp, as @sbc100 pointed out above, there's already code out there that simply uses fopen("/tmp/...", ...) - special POSIX mount points is not something we either can or should emulate in WASI, nor something I'm asking for.

I'm only asking to expose a generic file creation API that is not tied to those folders and is simply operating at the same level as fopen, in wasi-libc.

RReverser avatar Feb 01 '21 17:02 RReverser

Having come back to this, I'm fairly sure you'll get precisely what you're asking for at some point. I think it's more that the handle work is just now arriving and the project is busy ensuring that those get used properly before they return to the issues in wasi-libc. I think it's just a timing thing.

squillace avatar Feb 05 '21 20:02 squillace