phobos Add 'std.format.read.formattedRead' overloads to return a Tuple with values read

This enhancement was mentioned in a Discord discussion as a potentially better alternative to the existing std.file.slurp. Although std.file.slurp does a similar thing, it is restricted to reading from files. This feature could be extended to any input range and replace the currently implemented in std.file since that one "combines multiple things together that would be more useful if they were separate and composable" (quoted from @pbackus).

Dec 10 '22 15:12 iK4tsu

Thanks for your pull request and interest in making D better, @iK4tsu! We are looking forward to reviewing it, and you should be hearing from a maintainer soon. Please verify that your PR follows this checklist:

My PR is fully covered with tests (you can see the coverage diff by visiting the details link of the codecov check)
My PR is as minimal as possible (smaller, focused PRs are easier to review than big ones)
I have provided a detailed rationale explaining my changes
New or modified functions have Ddoc comments (with Params: and Returns:)

Please see CONTRIBUTING.md for more information.

If you have addressed all reviews or aren't sure how to proceed, don't hesitate to ping us with a simple comment.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

Testing this PR locally

If you don't have a local development environment setup, you can use Digger to test this PR:

dub run digger -- build "master + phobos#8647"

Dec 10 '22 15:12 dlang-bot

It seems the documentation is not style-compliant. How do I document the template and the function correctly? I tried to look for existing examples but couldn't find any.

Dec 10 '22 15:12 iK4tsu

You need to document both the outer template and the inner function.

Dec 10 '22 16:12 pbackus

update: autosquashed commits since CI passed

Dec 10 '22 19:12 iK4tsu

Just leaving the tuple members default-initialized when there's nothing to read is not good enough, because it gives the user no way to distinguish between a default value read from the input and a default value returned due to incorrectly formatted input. For example:

auto result1 = formattedRead!int("0", "%d");
auto result2 = formattedRead!int("", "%d");
assert(result1 == result2); // no difference

We need some other way of signalling an incomplete read in situations like this.

Dec 12 '22 15:12 pbackus

We could extend the type to return the number of arguments read as the first tuple argument. But that would defeat the practicality of this, and kinda force the user to handle the value, even if it means mapping it out.

We also could flip the signature and add an optional out parameter for the number of arguments read. But for that, I think the user might as well just use the already existing overload that receives a tuple since they would have to handle the value separately anyway.

Tuple!(int, float) values;
if (!"1 2.125".formattedRead("%s %s", values) == 2) ...
...

Or perhaps adding a sentinel template argument for that: countReadArgs or something like that. But I think this other way to distinguish these cases should be something that does not take away the practicality of the implementation and should not get in the way of the user/be forced upon them.

Dec 12 '22 16:12 iK4tsu

Another option is to throw an exception—iirc this is what slurp does.

Dec 12 '22 16:12 pbackus

Not really what slurp does. It does throw but only when the format is malformed (a format exception) or if there are remaining characters in the current line being parsed. This throws:

writeFile("deleteme", "123 ");
scope(exit) removeFile("deleteme");
"deleteme".slurp!(int)("%d");
// object.Exception@/dlang/dmd/linux/bin64/../../src/phobos/std/file.d(5191): Trailing characters at the end of line: ` '

This is runs fine:

writeFile("deleteme", "123");
scope(exit) removeFile("deleteme");
assert("deleteme".slurp!(int, string)("%d %s").equal([tuple(123, "")]));

Dec 12 '22 17:12 iK4tsu

Just leaving the tuple members default-initialized when there's nothing to read is not good enough, because it gives the user no way to distinguish between a default value read from the input and a default value returned due to incorrectly formatted input. For example:
auto result1 = formattedRead!int("0", "%d");
auto result2 = formattedRead!int("", "%d");
assert(result1 == result2); // no difference
We need some other way of signalling an incomplete read in situations like this.

I think the way is to throw an exception, but I would say to also provide an overload to allow the user to specify a tuple to avoid throwing and allocating on these cases.

Dec 16 '22 22:12 ljmf00

@ljmf00 The existing overloads of formattedRead already accept a tuple as an argument; see the example here: https://github.com/dlang/phobos/blob/v2.101.1/std/format/read.d#L355-L361

Dec 16 '22 22:12 pbackus

@ljmf00 The existing overloads of formattedRead already accept a tuple as an argument; see the example here: https://github.com/dlang/phobos/blob/v2.101.1/std/format/read.d#L355-L361

Ok cool. Then I think if the user wants to use this prettier version, my suggestion is to raise an exception. If the user wants to use it for control flow they have alternatives and exceptions are zero cost if not raised, so it's fine to do it.

Dec 17 '22 16:12 ljmf00

There's also the possibility of making it an optional choice too. Instead of always raising an exception we can have a flag to disable it (this would match the current slurp behavior).

formattedRead(Flag!"exhaustive" exhaustive = Yes.exhaustive, ...)
{
    ...
    auto argsRead = .formattedRead(...);
    static if (exhaustive) enforce(argsRead == Args.length)
    ...
}

This would make the transition to deprecate slurp in favor of this easier I think.

Dec 19 '22 18:12 iK4tsu

Another bug I found while playing with formattedRead: when using Tuple and the templated format version, the check is not performed on the tuple arguments but on the tuple itself leading to a static assert:

Tuple!(int, string) tup;
"123".formattedRead!"%d %s"(tup);
// Error: static assert:  "Orphan format specifier: %s"

The non-templated version executes just fine.

Dec 19 '22 18:12 iK4tsu

With only an exception being thrown there's no real way to handle failures. The best I could think of is:

only("123 num", "123")
    .formattedRead!(int, string)("%d %s")
    .map!nullable
    .handle!(<ExceptionName>, RangePrimitive.front, (e, r) => typeof(r.front).init );
    // [Tuple!(int, string)(123, "num"), Nullable.null]

But if the exception also stored info on how many arguments were read and the tuple result then the user could handle it better if they wanted to. There may be a high possibility the user would like to keep the read arguments.

Dec 20 '22 10:12 iK4tsu

Another bug I found while playing with formattedRead: when using Tuple and the templated format version, the check is not performed on the tuple arguments but on the tuple itself leading to a static assert:
Tuple!(int, string) tup;
"123".formattedRead!"%d %s"(tup);
// Error: static assert:  "Orphan format specifier: %s"
The non-templated version executes just fine.

File a bug report.

Dec 23 '22 22:12 ljmf00

@atilaneves we need to decide an acceptable design for this feature. What's your thoughts about the ideas we discussed above?

Dec 23 '22 23:12 ljmf00

Another bug I found while playing with formattedRead: when using Tuple and the templated format version, the check is not performed on the tuple arguments but on the tuple itself leading to a static assert:
Tuple!(int, string) tup;
"123".formattedRead!"%d %s"(tup);
// Error: static assert:  "Orphan format specifier: %s"
The non-templated version executes just fine.
File a bug report.

done: #8661

Jan 04 '23 21:01 iK4tsu

I agree with @pbackus and think that throwing an exception is the right way to deal with missing data.

Jan 22 '23 14:01 atilaneves

I agree with @pbackus and think that throwing an exception is the right way to deal with missing data.

I think it makes sense too. So you are in favor of the overall change with the particularity of using an exception, on these cases?

@iK4tsu can you add a changelog entry so we can get this in?

Jan 23 '23 11:01 ljmf00

Also, use an exception, as suggested by @atilaneves

Jan 23 '23 11:01 ljmf00

Sorry for the delay on this. Now both functions optionally throw a FormatException if the format range ends early and not all arguments are filled. This optional behavior is activated by default but can be ignored by passing No.exhaustive as an argument. Is this a good compromise, or is it best if the exception is always enforced?

Mar 22 '23 21:03 iK4tsu

can you add a changelog entry so we can get this in?

I will add a changelog entry when the final design gets the green light. That way I don't have to update it every time I change the design here.

Mar 22 '23 21:03 iK4tsu

update: rebased with master

Mar 22 '23 21:03 iK4tsu

What's the use-case for No.exhaustive? If the user wants to handle partial reads, it seems to me that they should use the overload that actually reports how much was read.

Mar 22 '23 22:03 pbackus

What's the use-case for No.exhaustive? If the user wants to handle partial reads, it seems to me that they should use the overload that actually reports how much was read.

The main usage is to comply with the current slurp implementation. Currently slurp does nothing if the range ends early and arguments are not filled. By following slurp's implementation, it can be replaced by this version of formattedRead without breaking changes.

The other usage is when the user does not care about the number of args filled at all but wants to chain the call. With the overload, they would ignore the result but they would be forced to declare all default initialized variables beforehand. With the forced exception, they would be forced to use std.exception.handle.

By being optional it still follows the current design. The "error" value can be ignored, and all unfilled arguments remain unchanged, in this case, with their default initializers, while providing a compatible way to replace the alternative/s.

One issue remains IMO if the exception is indeed thrown and caught. The user knows something happened with the format, but not in specific. If it failed due to unfilled arguments, the user won't know it - by the FormatException - nor how many were filled. Should I change the Exception to a more specific one that also stores how many arguments were filled? It could potentially store the tuple with all arguments as well, but that would require it to be a templated exception. Just throwing an exception doesn't really match what can be done with the overload.

Mar 23 '23 12:03 iK4tsu

The main usage is to comply with the current slurp implementation.

IMO the way slurp handles partial reads is a mistake (and arguably a bug), not something we should copy.

The other usage is when the user does not care about the number of args filled at all but wants to chain the call.

Under what circumstances, exactly, would a user call formattedRead without caring whether the data they asked for is actually read?

Should I change the Exception to a more specific one that also stores how many arguments were filled? It could potentially store the tuple with all arguments as well, but that would require it to be a templated exception.

I think the plain FormatException is fine. The purpose of this new overload is to make things more convenient in the common case, not to cover every possible use case. Most of the time, the outcomes you care about are "all arguments read successfully" and "something went wrong." If a user wants more detailed information, they can use the existing overloads.

Mar 23 '23 14:03 pbackus

IMO the way slurp handles partial reads is a mistake (and arguably a bug), not something we should copy.

Well, but wouldn't that make it harder to deprecate slurp in favor of this by Phobos' standards?

Under what circumstances, exactly, would a user call formattedRead without caring whether the data they asked for is actually read?

The same situations where users write code like:

int a, b, c;

// don't care about the output
// don't care if the format ends early
formattedRead!someFormat(a, b, c);

// use a b c

or any slurp usage.

But once again, this was more to facilitate the transition from one to another and to keep the existing nonthrowing logic of the other overloads.

Mar 27 '23 10:03 iK4tsu

Well, but wouldn't that make it harder to deprecate slurp in favor of this by Phobos' standards?

As far as I know there is no plan to deprecate or remove slurp.

If you mean, would it make it harder to refactor slurp to use the new formattedRead overloads internally—yes, it would. But I would rather have a good API for future users of the new formattedRead overloads and force slurp to use a workaround than have a good API for slurp and force future users of formattedRead to avoid a footgun (which is what No.exhaustive would be).

The same situations where users write code like:

int a, b, c;

// don't care about the output
// don't care if the format ends early
formattedRead!someFormat(a, b, c);

// use a b c

In other words: situations where users write buggy code because the API makes it easy to do the wrong thing. :)

Since we have the opportunity here to design a new API, I think we should try to avoid making it easy to do the wrong thing—which meas not including a "do the wrong thing" flag.

Mar 27 '23 17:03 pbackus

Since we have the opportunity here to design a new API, I think we should try to avoid making it easy to do the wrong thing—which meas not including a "do the wrong thing" flag.

Fair enough. I will change it and create a changelog entry when I have a bit of free time.

Mar 28 '23 10:03 iK4tsu

phobos phobos copied to clipboard

Add 'std.format.read.formattedRead' overloads to return a Tuple with values read

Bugzilla references

Testing this PR locally

phobos
phobos copied to clipboard