PerlIO-encoding: Detect no progress made on input
This fixes GH #10258
If retrying getting the next byte doesn't make progress, fail
I'm unsure what all should be done, or if this needs to worry about input from a slow source. @leont ? But it starts the discussion
Because of change in XS file, needs a version bump.
$ git diff
diff --git a/ext/PerlIO-encoding/encoding.pm b/ext/PerlIO-encoding/encoding.pm
index 886afbc488..1e8f0b379d 100644
--- a/ext/PerlIO-encoding/encoding.pm
+++ b/ext/PerlIO-encoding/encoding.pm
@@ -1,7 +1,7 @@
package PerlIO::encoding;
use strict;
-our $VERSION = '0.31';
+our $VERSION = '0.32';
our $DEBUG = 0;
$DEBUG and warn __PACKAGE__, " called by ", join(", ", caller), "\n";
I'm unsure what all should be done, or if this needs to worry about input from a slow source. @Leont ? But it starts the discussion
I have a feeling it's wrong, but I'm not sure yet. The code is hard to follow.
I'm wondering if
https://rt.cpan.org/Ticket/Display.html?id=129086&results=cd7972ee07e26803d6c87aca3dabb0cb
has any bearing
I think it's a reasonable solution for now.
I do think it could use tests.
As far as I can tell, it's entirely legal (but unlikely) that one needs more than two reads to complete a codepoint. This change has the potential to break working code. I would suggest to only enable this new behavior if e->flags & NEEDS_LINES
Fundamentally Encode has the wrong shape for what we're doing, and the real solution is probably to rewrite a lot of that logic. Not that I expect that to happen.
This change has the potential to break working code. I would suggest to only enable this new behavior if
e->flags & NEEDS_LINES
It should probably depend on seeing eof from the underlying layer.
I had thought there was an infinite loop issue with non-NEED_LINES encodings as well, but I wasn't able to get that infinite loop.
There is a problem with stop-at-partial handling though, any incomplete characters at end of file are discarded rather than decoded with the fallback:
$ ./perl -Ilib -CS -E 'my $x = "xx\xF1\x81\x80xx"; open my $fh, "<:encoding(UTF-8)", \$x or die; say <$fh>'
xx\xF1\x81\x80xx
$ ./perl -Ilib -CS -E 'my $x = "xx\xF1\x81\x80"; open my $fh, "<:encoding(UTF-8)", \$x or die; say <$fh>'
xx
There is a problem with stop-at-partial handling though, any incomplete characters at end of file are discarded rather than decoded with the fallback:
Yeah that's part of the "we need a better encode that actually understand the concept of streaming", the decoding should have a concept of EOF.
It could also be good for performance reasons, because currently two pieces of C code are talking through Perl method calls and SV*s and that's just silly.
I don't understand the global picture here, but looking at the detail of just the part I changed, I don't understand how that could change the incomplete chars at the end of file. It does seem to me that -a 1 return meaning both EOF and error is a problem.
The boolean I added could be changed to a count, so it quits only after retrying n times without progress. But I didn't see any advantage to that in my very limited understanding of how things work, unless there is a timeout below that can get triggered by, for example, a slow connection to the source of the stream, and a higher level protocol detects that and retries.
Reviewing the discussion in this ticket, my impression is that we collectively don't have enough understanding of the problem to agree on a path forward. I recommend that the p.r. be closed and that a new Issue be opened in which we can further clarify the problem.
Why not merge this as an experiment since it is early in the development cycle?
Why not merge this as an experiment since it is early in the development cycle?
I really think that's a bad idea, because I believe it's wrong in a non-obvious way. No current test is going to find it, and when the issue does hit the cause will entirely non-obvious.
Unless some committer explicitly says this ticket should remain open, I'm going to close it in no less than 7 days.
Unless some committer explicitly says this ticket should remain open, I'm going to close it in no less than 7 days.
No one spoke up; closing.