phobos
phobos copied to clipboard
Add 'std.format.read.formattedRead' overloads to return a Tuple with values read
This enhancement was mentioned in a Discord discussion as a potentially better alternative to the existing std.file.slurp. Although std.file.slurp does a similar thing, it is restricted to reading from files. This feature could be extended to any input range and replace the currently implemented in std.file since that one "combines multiple things together that would be more useful if they were separate and composable" (quoted from @pbackus).
Thanks for your pull request and interest in making D better, @iK4tsu! We are looking forward to reviewing it, and you should be hearing from a maintainer soon. Please verify that your PR follows this checklist:
- My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
- My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
- I have provided a detailed rationale explaining my changes
- New or modified functions have Ddoc comments (with
Params:andReturns:)
Please see CONTRIBUTING.md for more information.
If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.
Bugzilla references
Your PR doesn't reference any Bugzilla issue.
If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.
Testing this PR locally
If you don't have a local development environment setup, you can use Digger to test this PR:
dub run digger -- build "master + phobos#8647"
It seems the documentation is not style-compliant. How do I document the template and the function correctly? I tried to look for existing examples but couldn't find any.
You need to document both the outer template and the inner function.
update: autosquashed commits since CI passed
Just leaving the tuple members default-initialized when there's nothing to read is not good enough, because it gives the user no way to distinguish between a default value read from the input and a default value returned due to incorrectly formatted input. For example:
auto result1 = formattedRead!int("0", "%d");
auto result2 = formattedRead!int("", "%d");
assert(result1 == result2); // no difference
We need some other way of signalling an incomplete read in situations like this.
We could extend the type to return the number of arguments read as the first tuple argument. But that would defeat the practicality of this, and kinda force the user to handle the value, even if it means mapping it out.
We also could flip the signature and add an optional out parameter for the number of arguments read. But for that, I think the user might as well just use the already existing overload that receives a tuple since they would have to handle the value separately anyway.
Tuple!(int, float) values;
if (!"1 2.125".formattedRead("%s %s", values) == 2) ...
...
Or perhaps adding a sentinel template argument for that: countReadArgs or something like that. But I think this other way to distinguish these cases should be something that does not take away the practicality of the implementation and should not get in the way of the user/be forced upon them.
Another option is to throw an exception—iirc this is what slurp does.
Not really what slurp does. It does throw but only when the format is malformed (a format exception) or if there are remaining characters in the current line being parsed.
This throws:
writeFile("deleteme", "123 ");
scope(exit) removeFile("deleteme");
"deleteme".slurp!(int)("%d");
// object.Exception@/dlang/dmd/linux/bin64/../../src/phobos/std/file.d(5191): Trailing characters at the end of line: ` '
This is runs fine:
writeFile("deleteme", "123");
scope(exit) removeFile("deleteme");
assert("deleteme".slurp!(int, string)("%d %s").equal([tuple(123, "")]));
Just leaving the tuple members default-initialized when there's nothing to read is not good enough, because it gives the user no way to distinguish between a default value read from the input and a default value returned due to incorrectly formatted input. For example:
auto result1 = formattedRead!int("0", "%d"); auto result2 = formattedRead!int("", "%d"); assert(result1 == result2); // no differenceWe need some other way of signalling an incomplete read in situations like this.
I think the way is to throw an exception, but I would say to also provide an overload to allow the user to specify a tuple to avoid throwing and allocating on these cases.
@ljmf00 The existing overloads of formattedRead already accept a tuple as an argument; see the example here: https://github.com/dlang/phobos/blob/v2.101.1/std/format/read.d#L355-L361
@ljmf00 The existing overloads of
formattedReadalready accept a tuple as an argument; see the example here: https://github.com/dlang/phobos/blob/v2.101.1/std/format/read.d#L355-L361
Ok cool. Then I think if the user wants to use this prettier version, my suggestion is to raise an exception. If the user wants to use it for control flow they have alternatives and exceptions are zero cost if not raised, so it's fine to do it.
There's also the possibility of making it an optional choice too. Instead of always raising an exception we can have a flag to disable it (this would match the current slurp behavior).
formattedRead(Flag!"exhaustive" exhaustive = Yes.exhaustive, ...)
{
...
auto argsRead = .formattedRead(...);
static if (exhaustive) enforce(argsRead == Args.length)
...
}
This would make the transition to deprecate slurp in favor of this easier I think.
Another bug I found while playing with formattedRead: when using Tuple and the templated format version, the check is not performed on the tuple arguments but on the tuple itself leading to a static assert:
Tuple!(int, string) tup;
"123".formattedRead!"%d %s"(tup);
// Error: static assert: "Orphan format specifier: %s"
The non-templated version executes just fine.
With only an exception being thrown there's no real way to handle failures. The best I could think of is:
only("123 num", "123")
.formattedRead!(int, string)("%d %s")
.map!nullable
.handle!(<ExceptionName>, RangePrimitive.front, (e, r) => typeof(r.front).init );
// [Tuple!(int, string)(123, "num"), Nullable.null]
But if the exception also stored info on how many arguments were read and the tuple result then the user could handle it better if they wanted to. There may be a high possibility the user would like to keep the read arguments.
Another bug I found while playing with
formattedRead: when usingTupleand the templated format version, the check is not performed on the tuple arguments but on the tuple itself leading to a static assert:Tuple!(int, string) tup; "123".formattedRead!"%d %s"(tup); // Error: static assert: "Orphan format specifier: %s"The non-templated version executes just fine.
File a bug report.
@atilaneves we need to decide an acceptable design for this feature. What's your thoughts about the ideas we discussed above?
Another bug I found while playing with
formattedRead: when usingTupleand the templated format version, the check is not performed on the tuple arguments but on the tuple itself leading to a static assert:Tuple!(int, string) tup; "123".formattedRead!"%d %s"(tup); // Error: static assert: "Orphan format specifier: %s"The non-templated version executes just fine.
File a bug report.
done: #8661
I agree with @pbackus and think that throwing an exception is the right way to deal with missing data.
I agree with @pbackus and think that throwing an exception is the right way to deal with missing data.
I think it makes sense too. So you are in favor of the overall change with the particularity of using an exception, on these cases?
@iK4tsu can you add a changelog entry so we can get this in?
Also, use an exception, as suggested by @atilaneves
Sorry for the delay on this. Now both functions optionally throw a FormatException if the format range ends early and not all arguments are filled. This optional behavior is activated by default but can be ignored by passing No.exhaustive as an argument. Is this a good compromise, or is it best if the exception is always enforced?
can you add a changelog entry so we can get this in?
I will add a changelog entry when the final design gets the green light. That way I don't have to update it every time I change the design here.
update: rebased with master
What's the use-case for No.exhaustive? If the user wants to handle partial reads, it seems to me that they should use the overload that actually reports how much was read.
What's the use-case for No.exhaustive? If the user wants to handle partial reads, it seems to me that they should use the overload that actually reports how much was read.
The main usage is to comply with the current slurp implementation. Currently slurp does nothing if the range ends early and arguments are not filled. By following slurp's implementation, it can be replaced by this version of formattedRead without breaking changes.
The other usage is when the user does not care about the number of args filled at all but wants to chain the call. With the overload, they would ignore the result but they would be forced to declare all default initialized variables beforehand. With the forced exception, they would be forced to use std.exception.handle.
By being optional it still follows the current design. The "error" value can be ignored, and all unfilled arguments remain unchanged, in this case, with their default initializers, while providing a compatible way to replace the alternative/s.
One issue remains IMO if the exception is indeed thrown and caught. The user knows something happened with the format, but not in specific. If it failed due to unfilled arguments, the user won't know it - by the FormatException - nor how many were filled. Should I change the Exception to a more specific one that also stores how many arguments were filled? It could potentially store the tuple with all arguments as well, but that would require it to be a templated exception. Just throwing an exception doesn't really match what can be done with the overload.
The main usage is to comply with the current
slurpimplementation.
IMO the way slurp handles partial reads is a mistake (and arguably a bug), not something we should copy.
The other usage is when the user does not care about the number of args filled at all but wants to chain the call.
Under what circumstances, exactly, would a user call formattedRead without caring whether the data they asked for is actually read?
Should I change the Exception to a more specific one that also stores how many arguments were filled? It could potentially store the tuple with all arguments as well, but that would require it to be a templated exception.
I think the plain FormatException is fine. The purpose of this new overload is to make things more convenient in the common case, not to cover every possible use case. Most of the time, the outcomes you care about are "all arguments read successfully" and "something went wrong." If a user wants more detailed information, they can use the existing overloads.
IMO the way slurp handles partial reads is a mistake (and arguably a bug), not something we should copy.
Well, but wouldn't that make it harder to deprecate slurp in favor of this by Phobos' standards?
Under what circumstances, exactly, would a user call formattedRead without caring whether the data they asked for is actually read?
The same situations where users write code like:
int a, b, c;
// don't care about the output
// don't care if the format ends early
formattedRead!someFormat(a, b, c);
// use a b c
or any slurp usage.
But once again, this was more to facilitate the transition from one to another and to keep the existing nonthrowing logic of the other overloads.
Well, but wouldn't that make it harder to deprecate
slurpin favor of this by Phobos' standards?
As far as I know there is no plan to deprecate or remove slurp.
If you mean, would it make it harder to refactor slurp to use the new formattedRead overloads internally—yes, it would. But I would rather have a good API for future users of the new formattedRead overloads and force slurp to use a workaround than have a good API for slurp and force future users of formattedRead to avoid a footgun (which is what No.exhaustive would be).
The same situations where users write code like:
int a, b, c; // don't care about the output // don't care if the format ends early formattedRead!someFormat(a, b, c); // use a b c
In other words: situations where users write buggy code because the API makes it easy to do the wrong thing. :)
Since we have the opportunity here to design a new API, I think we should try to avoid making it easy to do the wrong thing—which meas not including a "do the wrong thing" flag.
Since we have the opportunity here to design a new API, I think we should try to avoid making it easy to do the wrong thing—which meas not including a "do the wrong thing" flag.
Fair enough. I will change it and create a changelog entry when I have a bit of free time.