phobos Issue 17912: Add function for creating a temporary file.

This adds std.file.createTempFile, which creates a temporary file with a random name (optionally with a specified prefix and/or suffix). By default, the file is empty, but if data is passed to createTempFile, then the file is populated with that data just like it would have been with std.file.write. The name of the file that was created is returned. So, essentially, createTempFile is like write except that it generates the file name for you and returns it.

Previously, we added scratchFile to std.stdio, which was like createTempFile except that it gave you an open File object, but it was scrapped, because the dependencies that it added to std.stdio made "hello world" bigger. See: https://issues.dlang.org/show_bug.cgi?id=14599

Just like scratchFile did, createTempFile ensures that a filename is selected which did not exist previously so that the file can't be hijacked with different permissions by someone creating a file with the same name beforehand (the randomness of the name should make that effectively impossible, but the way that createTempFile opens the file guarantees it).

These changes also consolidate some of write's implementation across OSes, since the writing portion of writeImpl was esentially identical across OSes but needlessly duplicated. That consolidated functionality is then also used by createTempFile.

Unfortunately, I don't currently have a Windows setup that I can test this with, so depending on what happens with the autotester, I may need to make a revision or two to get it to pass on Windows. I might get lucky though.

Oct 17 '17 23:10 jmdavis

Thanks for your pull request, @jmdavis!

Bugzilla references

Auto-close	Bugzilla	Severity	Description
✗	17912	enhancement	Add function to std.file for creating a temporary file with a name

Oct 17 '17 23:10 dlang-bot

Great. By some miracle, Windows went perfectly on the first go, but then Mac OS X failed for some reason. :|

Oct 18 '17 01:10 jmdavis

It passed Mac OS X this time though, so I have no clue what happened. In theory, it should act the same on Mac OS X as the other POSIX platforms.

Oct 18 '17 02:10 jmdavis

Why not mkstemp on POSIX?

Why not GetTempFileName on Windows?

Oct 18 '17 13:10 CyberShadow

Why not mkstemp on POSIX?

Because the last time I looked into it, there were some POSIX systems that could only generate something like 26 file names with mkstemp. I don't think that the ones dmd supports at the moment have that problem, but others did, which then potentially becomes a problem for LDC or GDC, and I see no advantage to using mkstemp over just doing it ourselves when some systems do a terrible job with mkstemp. The code used here was in Phobos previously with scratchFile in std.stdio. It just got ripped out because it made "hello world" bigger.

Why not GetTempFileName on Windows?

It doesn't support suffixes (which IMHO is more important than a prefix, since it can be used for file extensions, which sometimes matter a great deal; prefixes matter a lot less frequently). It also doesn't let you write to the file when it's created. You have to reopen it and then write to it. Also, the docs themselves seem to think that you shouldn't use it, saying

"To avoid problems resulting when converting an ANSI string, an application should call the CreateFile function to create a temporary file."

So, GetTempFile can't be used unless we don't have a suffix or if we call rename on the function afterwards (which is likely a bad idea), and Microsoft's docs are basically saying to do what I did.

So, I'd just as soon avoid mkstemp regardless of what Windows is up to, but it looks like we need to implement this ourselves on Windows anyway, and this way, what it's doing should be consistent across systems.

Oct 18 '17 20:10 jmdavis

I don't think we should be using the global rndGen for this. Some part of the program may initialize it to some seed, which could result in a denial of service as this function retries the same file names over and over. Another source of randomness would be better.

Edit: Another reason not to use rndGen is that it potentially leaks information about the state of the global RNG, which might make some local attacks easier to perform.

Oct 18 '17 21:10 CyberShadow

Because the last time I looked into it, there were some POSIX systems that could only generate something like 26 file names with mkstemp.

What are they?

and I see no advantage to using mkstemp over just doing it ourselves when some systems do a terrible job with mkstemp.

There's a very good chance that we get things wrong too - definitely some things that might be worth worrying about in the initial implementation here. Generally I don't think D is expected to fix or work around bugs in ancient/unsupported libcs.

So, GetTempFile can't be used unless we don't have a suffix or if we call rename on the function afterwards (which is likely a bad idea), and Microsoft's docs are basically saying to do what I did.

Yeah, on closer look, GetTempFileName looks not that great. You have to specify the random value anyway, and it's limited to 65535 unique file names (with the same prefix).

Oct 18 '17 21:10 CyberShadow

Okay, per the review comments, it now is pickier about which errors it retries for, and it's not using rndGen, so it doesn't run the risk of being screwed up by rndGen being badly reseeded by the program or the random numbers being reused within the program due to the issues caused by the random number generators being value types instead of reference types.

Because the last time I looked into it, there were some POSIX systems that could only generate something like 26 file names with mkstemp.

What are they?

I think that it was Solaris that was the problem, but it's been quite a while since I implemented scratchFile. Digging around now, SmartOS' man page (which is a "distro" of Illumos and thus the version of Solaris that matters), and all it says on the matter is that "It is possible to run out of letters." It does make it sound like it only supports 6 X's though, whereas FreeBSD's seems to support an arbitrary number. It might be that Illumos' implementation would be fine, but that man page does exactly give me warm fuzzies about it (whereas FreeBSD is quite explicit about how many possible values it has).

But given that we need to implement this ourselves on Windows anyway, I'd just as soon use our implementation on all systems, and then if there's a problem, it will be caught more easily. The key thing to getting it right is how the file is opened (it needs to make sure that the file is created when it's opened and can't do that by checking for existence first, or there's a race condition); the rest is mostly just a question of having a suitably random name, and unless we have serious problems with our random number generator, that shouldn't be a problem (and at the moment, this picks 15 random alphanumeric characters, so the number of possible values is huge).

Oct 19 '17 02:10 jmdavis

This is very nice, in that it frees us from having to rely on potentially poorly-implemented mkstemp functions in the OS / standard C library.

However, it warrants very close scrutiny, because when it comes to security, one can never be too paranoid. As far as POSIX is concerned, I think calling open() with O_EXCL | O_CREAT ought to rule out race conditions with file creation; and making the file read/write only upon creation by the user should rule out an outside attacker overwriting the file with potential attack vectors. It doesn't exclude the possibility of someone who has access to the current user account from manipulating the file in some way, but if they can do that, we're already screwed anyway, so presumably that should be OK as well. So it seems OK to me.

But I think reviewing it against various mkstemp implementations would be wise before we merge this. The last thing we want is for users to start using this and then discovering a security loophole, and D will get very bad rep for having insecure functions in its standard library.

Oct 19 '17 02:10 quickfur

Oh, one potential thing: this function ought to be thread-safe. I believe it is (the static RNG seed should be in TLS, right?), but someone should review it carefully just in case.

Oct 19 '17 02:10 quickfur

But I think reviewing it against various mkstemp implementations would be wise before we merge this. The last thing we want is for users to start using this and then discovering a security loophole, and D will get very bad rep for having insecure functions in its standard library.

If you can think of a potential security hole, then great, but all this does is generate a random file name and then ensure that when it creates the file, that file doesn't already exist. Ensuring that it doesn't overwrite any files solves the key security problem. I don't see how it would be possible to do anything else to protect against security problems, because there's so little involved here. Certainly, if there's a potential security problem with creating a file, closing it, and reopening it, then there could be a problem, but there's nothing that we can do about that. If that's your problem, then go use std.stdio.tmpfile, but if you need to create a file and then reopen it, that doesn't cut it, and in my experience, that's always what I've needed from a temp file; as a result, I've found tmpfile to be worse than useless.

Oh, one potential thing: this function ought to be thread-safe. I believe it is (the static RNG seed should be in TLS, right?), but someone should review it carefully just in case.

Of course, it's thread-safe. Nothing in it is shared, and std.random doesn't use shared anywhere. __gshared isn't used either. The only risk of thread-safety issues would be the C functions involved, and if open isn't thread-safe, well, we're screwed anyway. D's type system makes it wonderfully easy to see that everything involved is in TLS.

Oct 19 '17 02:10 jmdavis

@quickfur If it makes you feel any better, I just looked over FreeBSD's implementation of mkstemp, and it's doing basically the same thing that we're doing here. The main difference is that instead of randomly generating file names after the first one already exists, it increments each of the characters in turn. But the file is opened the same way.

Oct 19 '17 04:10 jmdavis

Instead of random number generator, could you use the approach used by other implementations: process' id and the current time (incrementing every loop)? That should be "random" enough for this purpose.

Oct 19 '17 08:10 nemanja-boric-sociomantic

Okay. The random failure on Mac OS X was because sometimes tempDir returns /tmp, which readLink claims is a symlink to private/tmp, but which doesn't exist (I think that it's supposed to be /private/tmp). I'm not sure what the deal with that is, but upon rereading the Linux and Mac OS X man pages for open, I think that I misunderstood the bit about O_EXCL not working with symlinks. I thought that it meant that directory it was in would fail if it was a symlink, but I think that it means that it fails if the target is a symlink (which makes sense, since it would be stupid to overwrite it), and Mac OS X does seem to pass even when tempDir is /tmp if I don't try to call readLink on it. So, I've removed the call to readLink, since it doesn't seem to be necessary. I don't know if the bit about private/tmp is a bug in readLink or what, but if readLink isn't called here, then it doesn't matter for this PR.

Instead of random number generator, could you use the approach used by other implementations: process' id and the current time (incrementing every loop)? That should be "random" enough for this purpose.

And what would be the benefit of that over using a random number generator? If anything, I would think that a random number generator would be better, because the value would be less predictable, so if there are any security concerns about a malicious program with the correct permissions guessing what the file is and mucking with it, then the risk would be lower (albeit not zero). I don't think that that's a huge concern (and using O_EXCL with open solves the problem of programs that wouldn't have permission being able to hijack the file), but I don't see any benefit in avoiding the random number generator either. FWIW, FreeBSD uses a random number generator with mkstemp just like we're doing here. It's just that it's using its own random number generator, and it increments the result if there's a conflict rather than generating another random name, whereas here, we just generate another random name. As it is, it should be extremely rare that the file would already exist given how much randomness we're using.

Oct 19 '17 10:10 jmdavis

And what would be the benefit of that over using a random number generator?

I've read other (now outdated comments), but I missed that you're not using the global RNG instance, but you create your own here. That looks fair to me.

Oct 19 '17 10:10 nemanja-boric-sociomantic

I think that it's supposed to be /private/tmp

Symlink destinations are relative to the directory the symlink itself is in, so that would make no difference, except when the filesystem is mounted elsewhere than on /, in which case private/tmp would be better. Operating systems / distributions which utilize symlinks in their standard filesystem layout will often use relative symlinks for this purpose.

Oct 19 '17 11:10 CyberShadow

Random is an alias to a 32-bit MersenneTwisterEngine which has 2.5 kilobytes of state. This is excessive for this task, which does not benefit from the properties the Mersenne Twister provides in exchange for taking up so much space. I recommend one of the following:

Use a much smaller PRNG, such as std.random.Xorshift, which has 16 bytes of state and is completely adequate for this task; or
Going against @CyberShadow's recommendation at https://github.com/dlang/phobos/pull/5788#issuecomment-337727725 and using a reference to the thread-local rndGen.

Oct 21 '17 16:10 n8sh

Okay. It's using rndGen again. That does have the downside that someone could reseed it, and cause createTempFile to have to make more attempts to find a valid file, but that's only an issue if someone is stupid enough to keep reseeding with the same seed, and I don't think that taking up extra memory for a separate random number generator to avoid that is worth it. Someone being that dumb can just shoot themselves in the foot. The fact that random numbers that were used prior to calling createTempFile could be reused if rndGen was used incorrectly is arguably problematic, but that's a problem with rndGen and std.random in general, and if we want to fix that, we really need to fix std.random.

Oct 29 '17 10:10 jmdavis

Why do that instead of use XorShift? That seemed less compromising.

Oct 29 '17 13:10 CyberShadow

Someone being that dumb can just shoot themselves in the foot.

This is a really unhelpful way of thinking when designing standard libraries.

Oct 29 '17 13:10 CyberShadow

Why do that instead of use XorShift? That seemed less compromising.

Because the more I think about it, the less reasonable it seems to create a static variable just for this function, and if rndGen isn't acceptable, we have far bigger problems. The default random number generator should be fine. The only downsides of which I am aware are

If the user keeps reseeding rndGen with the same seed.
If the user incorrectly uses rndGen before calling createTemporaryFile (e.g. by copying it when getting random numbers), and the random numbers that createTemporaryFile uses were some of the ones that were just used.

I don't see it as particularly realistic that someone is going to keep reseeding rndGen with the same value. Yes, it could happen, but it's a pretty stupid thing to do, and there's only so much that we can do to protect the user. And even if they do keep reseeding it with the same value, unless they're creating a lot of temporary files, it still won't matter, because worst case, all of the previously created temporary files still exist, and createTemporaryFile has to try each of them before getting to one that works, and there would have to be a lot of temporary files that were being created and not deleted for that to be a problem. If someone actually is foolish enough to keep reseeding rndGen with the same seed while using createTemporaryFile, and they use enough temporary files that it's a problem, then their program may be slow, but then they can debug it and fix the fact that they keep reseeding rndGen. It shouldn't be hard to figure out what's slow, even if it wouldn't be immediately obvious as to why.

The fact that rndGen is annoyingly easy to use incorrectly by accidentally copying it and reusing random numbers is definitely a design flaw of std.random, and it definitely does risk random numbers being reused, but the numbers that were used for a previous call to createTemporaryFile won't be reused, since it's not using it incorrectly. So, it's not going to result in a name conflict that will require trying a new name. Worst case, it makes the random names more predictable, but because of how the files are created and opened (specifically, the file must not exist previously), there isn't much of a security risk there. The only risk is that a program with the same permissions as the one using createTemporaryFile could access the file, and such a program could just watch /tmp for new files if that's what it's trying to do.

So, while I do agree that there are some problems using rndGen, I don't think that they're serious, and I don't like the idea of wasting memory for a static variable that shouldn't be necessary. I also don't think that it's good policy to be avoiding our own standard random number generator.

Someone being that dumb can just shoot themselves in the foot.

This is a really unhelpful way of thinking when designing standard libraries.

There's a limit to how much we can reasonably protect a user from themselves, and I don't see why reseeding rndGen is something that many programs would be doing, let alone reseeding it with the same value continuously. To do that would show a complete lack of understanding of how pseudo-random number generators work, and the programmer doing it would likely run into far worse problems than having their program be slow when they create a lot of temporary files. And the fact that their program would be slow might actually help clue them into what they're doing wrong.

Oct 29 '17 14:10 jmdavis

I think the implications of the possibility of security risks far outweigh the 4-byte cost of a counter variable.

Oct 29 '17 15:10 CyberShadow

I think the implications of the possibility of security risks far outweigh the 4-byte cost of a counter variable.

What risks? The fact that it would be easier to guess the file name? As I said, anything trying to do that could just watch /tmp. The fact that the file name is harder to guess just makes it harder. It doesn't actually stop anything. It's the file permissions that actually protect the file, and createTempFile deals with that problem correctly.

Oct 29 '17 15:10 jmdavis

Any time you open a file, you run the risk of another program with the appropriate permissions messing with it - and many of those files have known names and are in known locations. I don't see why temporary files would be special in that regard or why they would need would need more protections than a normal file. As long as we don't overwrite an existing file (which could therefore have different permissions and allow a program access which shouldn't have access), we should be fine. Or is there some other issue here that I'm not seeing?

Oct 29 '17 16:10 jmdavis

Sorry if I was too terse or ambiguous.

Here's the situation I'm thinking of:

A program is generating random tokens (nonces or such) as part of a security protocol. It uses the global rndGen to do so.

At the same time, another part of the program (perhaps, in a library three Dub dependencies away) needs to create a temporary file. The function which does this also uses the global rndGen.

Now, because temporary directories are often global per machine, and any user is allowed to enumerate their contents, this leaks the state of rndGen. With enough temporary files, it should be possible to guess the initial random seed, and predict what random tokens the program has generated or will generate.

Now, by itself the description of the above doesn't seem likely that it could lead to an exploitable vulnerability. However, most security breaches these days occur due to a sequence of security weaknesses, which when exploited together, allow the attackers to reach their goal.

This is why I think that with any matter where the choice is a possible security weakness and a slight performance cost, the more secure solution should be heavily preferred.

Oct 29 '17 17:10 CyberShadow

29 окт. 2017 г., в 20:07, Vladimir Panteleev [email protected] написал(а):

Sorry if I was too terse or ambiguous.

Here's the situation I'm thinking of:

A program is generating random tokens (nonces or such) as part of a security protocol. It uses the global rndGen to do so.

WAT? That is precisely what it shouldn’t do. There are strong PRNGs for cryptographically secure random numbers (CPRNGs).

Fir instance XorShift can be cracked by observing a handful of numbers. Same for MerseneTwister.

—

You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or mute the thread.

Oct 29 '17 18:10 DmitryOlshansky

Yes. The protocol might not require that the token be generated from a cryptographically secure random source. As I already said, it might very well be just part of the complete attack. For example, knowing the token might greatly reduce the amount of computation needed to brute-force the plaintext from the cyphertext.

Oct 29 '17 18:10 CyberShadow

29 окт. 2017 г., в 21:16, Vladimir Panteleev [email protected] написал(а):

Yes. The protocol might not require that the token be generated from a cryptographically secure random source. As I already said, it might very well be just part of the complete attack. For example, knowing the token might greatly reduce the amount of computation needed to brute-force the plaintext from the cyphertext.

Then said protocol is as secure as this attack is feasible. If you can break it knowing the token then the protocol is broken.

Protocol that allow easily guessable token are generally secured in some other way e.g. preshared secret.

I think the level of paranoia should be lowered a notch. People who do security for living are usually skeptical of std implementations of anything even remotely close to the sensitive data - nonces, keys, signatures, OTPs, temp files, caches, etc. Those who don’t ain’t worth their money and we can’t devise every mistake they might make by blindly using std library.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Oct 29 '17 18:10 DmitryOlshansky

Then said protocol is as secure as this attack is feasible. If you can break it knowing the token then the protocol is broken.

Computer security doesn't work that way. If you can design a flawless protocol or a write bug-free program, then none of this would be necessary, but the reality is the extreme opposite of that. Consider how many revisions of SSL and TLS it took to get to where we are now, and then how many bugs in their implementations have been found.

People who do security for living

This is also a very unhelpful way to think about software today. Practically everything is Internet-connected, and a lot of things can become an attack vector.

Rust is winning mind-share because they are serious and proactive about security, and I don't mean just memory safety. For example, their hashmaps use a random value as part of the hash, to prevent denial of services occurring from an attacker sending input that generates tons of hash (bucket) collisions. We need to do the same.

Oct 29 '17 21:10 CyberShadow

30 окт. 2017 г., в 0:06, Vladimir Panteleev [email protected] написал(а):

Then said protocol is as secure as this attack is feasible. If you can break it knowing the token then the protocol is broken.

Computer security doesn't work that way.

It does work exactly like that. Making something less insecure is how it doesn’t work.

If you can design a flawless protocol or a write bug-free program, then none of this would be necessary, but the reality is the extreme opposite of that. Consider how many revisions of SSL and TLS it took to get to where we are now, and then how many bugs in their implementations have been found.

Flawless protocol is strange remark. Protocol have to be flawless to the point that nobody can find a sufficient vulnerability (yet), implementations typically are the problem.

All in all, protocols are designed such that

Something is a secret and must be protected (including from guessing).
Something is not a secret and may be shared freely.

Anything in “gray” area is just asking for problems. You don’t secure things by making them less insecure.

People who do security for living

This is also a very unhelpful way to think about software today. Practically everything is Internet-connected, and a lot of things can become an attack vector.

Then whoever does that software will have to care about security as well. We have to hire people if don’t grasp the subject.

Rust is winning mind-share because they are serious and proactive about security, and I don't mean just memory safety. For example, their hashmaps use a random value as part of the hash, to prevent denial of services occurring from an attacker sending input that generates tons of hash (bucket) collisions.

100% to be precise. Which is importantly a proper well known attack vector that even Perl has patched.

We need to do the same.

But if you were to follow the lead fool proofing random generator you’d need CPRNG e.g. Fortuna as default rng. Picking XorShift is exactly what you don’t do.

I wouldn’t be opposed to introduce secure rngs by default but it’s none of this PR business. If somebody can get it right that is.

All in all fool proofing things with an eye to potential security problem is fine, but:

You don’t pick an arbitrary PR to do this, it is a full-scale project itself that may entail dozens of PRs.
Even then some caution is required to not go overboard as we should not impose “less insecure” primitives on people that just run simulations and mostly care about speed. An extreme example would be imposing constant time string compare on everybody because the might compare secure hashes that way.
False sense of security. So imagine we done wonders to fool-proof at library level trying to make sure a user can’t use it insecurely. It says nothing about security of actual applications, sadly.
A great deal of exploits are trivial buffer overruns still, so focus is important as well.
We need at least a couple of security audits. Rust has a trove of experience with Firefox, we have nothing.

Anyhow before this goes OT.

Please stop mutilating this PR motivating it by security. Your suggestion still doesn’t help it in any capacity.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

Oct 30 '17 05:10 DmitryOlshansky