const-eval
const-eval copied to clipboard
Restricted file writes in `const fn`
Hi! :wave:
During Oxidizeconf's impl days @oli-obk and I talked about adding a very restricted version of disk writes to const fn contexts. This is an issue to continue that conversation.
Motivation
For the CLI WG we want to generate files during compilation such as shell completions, and man pages.
Currently the best way to achieve this is by creating a build.rs file, and making sure the right structs are exported. This is not great, because it's easy to mess up, the use of build.rs triggers a double compilation, and certain dependencies also need to be required in as both [dependencies] and [dev-dependencies].
Instead it would be a lot nicer if there was a way to generate output during compilation that wouldn't require any additional setup beyond the usual flow of using crates.
Current Proposal
The idea we discussed was to add a limited form of writing files to const fn. The reasoning for limited filesystem access, rather than full access is to prevent people from writing output in a prior compile, and reading it back in during a next compile, causing problems for reproducibility.
What we discussed was something akin to read_bytes! / const fs::read, but for writing output to a special folder somewhere in target/ that gets removed at the start of each build to ensure no data from a prior build persists.
Thanks! :sparkles:
One further idea how to restrict this even more is
#[const_write = "foo.man"]
const _: &[u8] = b"foomp"; // or some actual const eval
which dumps the bytes of a constant with that attribute to the given file (but the file is created in e.g. target/debug/const_write_dump.
I'd be careful with additions like this. There are opportunities for unsafe code to exploit the fact that e.g. a const fn foo(i32) -> i32 will do the same thing no matter how often it is called (with the same argument), and might as well not be called at all without making an observable difference. That would certainly exclude any kind of reading of files in const fn, but even with just writes we'd lose the "no observable difference between 0 and 1 calls" guarantee.
Pure writes could be permitted, but const fn still strikes me as the wrong solution for the stated problem.
Controlling the write declaratively instead of imperatively, as proposed by @oli-obk, would fix this concern.
During a discussion @Centril and I found a few points that I should write down here:
Why an attribute?
The attribute on constant scheme is better than a const fn which writes to the filesystem, as there's no runtime equivalent. So if it were a const fn, then const fn could call it and try to write to an output file, but if the const fn is called at runtime, it's unclear what would happen.
Permitted types
To start out with we could just allow &[u8] and require the user to produce the corresponding data via const eval (e.g. by having a constant compute the result into an array and returning that)
While we could allow types other than [u8] it's unclear how they should be serialized, and that serialization would have to be const evaluable anyway. If it is const evaluable, you can always serialize to an array of u8 and put that into a constant.
Filenames
for simplicity we'd only allow output filenames of the following regex: [a-z0-9_]+[a-z0-9_-\.]* to prevent any troubles that could come from other filenames (attempting to crawl up directories, trying to escape some path scheme...)
When to do this
While such a feature is not that hard to implement, it is pretty useless right now, since we have no loops, if or match.
"Precedent"
There's some precedent for this. On wasm, the #[link_section] attribute for static items will raw dump the bytes of the static into the chose section. This also forbids the use of relocations in the static (so no pointers). While this doesn't dump to an extra user-choosable output file, that difference seems minor to me.
@eddyb threw artifact into the bikeshed hat. So maybe something like
#[emit_artifact = "foo.man"]
const _: &[u8] = b"foomp";
@Centril and I briefly discussed a similar concept -- largely, my/our discussion centered around something like this being possible.
We noted that PLACES would only allow read access at run-time, not compile-time: this prevents any issue with folks observing different states at compile time depending on ordering and such.
@Centril was (rightfully) concerned about people depending on the implicit ordering, and I suggested a couple of mitigations: intentional shuffling of the order (sort of obvious, and not great due to non-determinism, even if we shuffle at run time, not compile time). I had the idea of saying that we only allow access via a sort function that would be const fn (and run at compile time, so no run-time overheads); we'd make it an error to provide a function that returned Ordering::Equal for non-bit-equivalent types, which would make it impossible to "observe" any problems, and since it's const fn it is a deterministic function (so there should be no way to get out of implementing a good ordering).
#[collection]
const PLACES: &[Line];
fn foo() {
const!(PLACES.add(line!()); // at line 3
}
fn bar() {
const!(PLACES.add(line!()); // at line 10
}
fn main() {
assert_eq!(PLACES, &[3, 10]);
}
@Mark-Simulacrum that sounds similar to the ideas that came up around custom test harnesses, where you would have something more like this:
// Syntax (extremely) subject to bikeshed.
const PLACES: &[Line] = &gather_from!(places);
fn foo() {
#[gather_into(places)]
const _: Line = line!(); // at line 3
}
fn bar() {
#[gather_into(places)]
const _: Line = line!(); // at line 10
}
fn main() {
assert_eq!(PLACES, &[3, 10]);
}
One advantage is that the sorted list/set design problem can be (partially) sidestepped by having an AST-driven order, as opposed to CTFE being involved at all.
cc @petrochenkov @Manishearth (although it's somewhat offtopic IMO)