Perl-Advent icon indicating copy to clipboard operation
Perl-Advent copied to clipboard

Fix UTF-8 double encoding in #!vim perl blocks

Open Copilot opened this issue 4 months ago • 0 comments

UTF-8 characters in #!vim perl code blocks were displaying as « instead of « due to double encoding. =begin perl blocks rendered correctly.

Root Cause

In inc/WWW-AdventCalendar/lib/WWW/AdventCalendar/Article.pm:

$body = Encode::encode('utf-8', $body);              # Line 121: encode for Pod::Elemental
my $document = Pod::Elemental->read_string($body);
# ... transformers process content ...
$body = $document->as_pod_string;                    # Line 140: returns UTF-8 bytes
$parser->parse_string_document( Encode::encode('utf-8', $body) );  # Line 149: re-encodes!

as_pod_string returns UTF-8 bytes, which were then re-encoded, turning c2 ab (UTF-8 for «) into c3 82 c2 ab (displays as «).

Changes

Decode UTF-8 bytes back to characters after transformation, before passing to Pod::Simple::XHTML:

$body = $document->as_pod_string;
$body = Encode::decode('utf-8', $body);  # Decode bytes → characters
$parser->parse_string_document( $body ); # Pass characters (expects strings, not bytes)

This ensures content is encoded exactly once.

Modified Files

  • inc/WWW-AdventCalendar submodule commit updated
    • lib/WWW/AdventCalendar/Article.pm: Added decode step, removed redundant encode

Fixes #541 (related to #505)

[!WARNING]

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • cpanmin.us
    • Triggering command: curl -L REDACTED (dns block)
  • esm.ubuntu.com
    • Triggering command: /usr/lib/apt/methods/https (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Original prompt

This section details on the original issue you should resolve

<issue_title>UTF-8 is double encoded in #!vim perl</issue_title> <issue_description>As can be seen in 2024-12-15, UTF-8 encoded characters are displayed correctly in =begin perl, but they're double encoded in #!vim perl:

=begin perl

      ok( headerOK( $str ), "«$str» is OK" );

renders «$str» correctly, but

    #!vim perl

          debug "«$firstLine» is proper markdown ho, ho, ho";

shows as

debug "«$firstLine» is proper markdown ho, ho, ho";

</issue_description>

Comments on the Issue (you are @copilot in this section)

@oalders Related to perladvent/Perl-Advent#505 @oalders Maybe it shouldn't have been closed. 😄 I guess there's an ongoing issue here. The `vim` syntax highlight is nice because you can choose an arbitrary language, but it sometimes does wacky things with Perl. This would all be easier if we just used Markdown, but I guess Pod is part of the charm.

If you feel inclined to send a PR with more fixes, I'm happy to merge!</comment_new>

  • Fixes perladvent/Perl-Advent#530

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

Copilot avatar Oct 29 '25 13:10 Copilot