perl5 icon indicating copy to clipboard operation
perl5 copied to clipboard

Problems with exposing is_utf8()

Open sblondeel opened this issue 3 years ago • 11 comments

While migrating from an old system I had to add utf8::upgrade($line) after each

$line = <$fh>

type code snippet.

Module: utf8

Description

If a scalar $var is undef utf8::upgrade($var) should not turn it into q{}.

Steps to Reproduce


use Data::Dumper;

my $var;

print Dumper($var);

utf8::upgrade($var);

print Dumper($var);

Output

$ perl /tmp/utf8.pl 
$VAR1 = undef;
Use of uninitialized value in subroutine entry at /tmp/utf8.pl line 7.
$VAR1 = '';

Expected output

$ perl /tmp/utf8.pl 
$VAR1 = undef;
Use of uninitialized value in subroutine entry at /tmp/utf8.pl line 7.
$VAR1 = undef;
Use of uninitialized value in subroutine entry at /tmp/utf8.pl line 9.

Expected behavior

An undef value should still be undef after being upgraded to the utf8 internal Perl representation of scalars.

undef is a special value of scalars, unrelated to any encoding scheme.

This breaks loops like

while (defined $line) {

  ...

  $line = <$fh>;
  utf8::upgrade($line);
}

Note: the utf8::upgrade call is in itself a circumvent of other Perl bugs/module bugs I have reported.

Perl configuration

Summary of my perl5 (revision 5 version 28 subversion 1) configuration:
   
  Platform:
    osname=linux
    osvers=4.9.0
    archname=x86_64-linux-gnu-thread-multi
    uname='linux localhost 4.9.0 #1 smp debian 4.9.0 x86_64 gnulinux '
    config_args='-Dusethreads -Duselargefiles -Dcc=x86_64-linux-gnu-gcc -Dcpp=x86_64-linux-gnu-cpp -Dld=x86_64-linux-gnu-gcc -Dccflags=-DDEBIAN -Wdate-time -D_FORTIFY_SOURCE=2 -g -O2 -fdebug-prefix-map=/build/perl-voFw8F/perl-5.28.1=. -fstack-protector-strong -Wformat -Werror=format-security -Dldflags= -Wl,-z,relro -Dlddlflags=-shared -Wl,-z,relro -Dcccdlflags=-fPIC -Darchname=x86_64-linux-gnu -Dprefix=/usr -Dprivlib=/usr/share/perl/5.28 -Darchlib=/usr/lib/x86_64-linux-gnu/perl/5.28 -Dvendorprefix=/usr -Dvendorlib=/usr/share/perl5 -Dvendorarch=/usr/lib/x86_64-linux-gnu/perl5/5.28 -Dsiteprefix=/usr/local -Dsitelib=/usr/local/share/perl/5.28.1 -Dsitearch=/usr/local/lib/x86_64-linux-gnu/perl/5.28.1 -Dman1dir=/usr/share/man/man1 -Dman3dir=/usr/share/man/man3 -Dsiteman1dir=/usr/local/man/man1 -Dsiteman3dir=/usr/local/man/man3 -Duse64bitint -Dman1ext=1 -Dman3ext=3perl -Dpager=/usr/bin/sensible-pager -Uafs -Ud_csh -Ud_ualarm -Uusesfio -Uusenm -Ui_libutil -Ui_xlocale -Uversiononly -DDEBUGGING=-g -Doptimize=-O2 -dEs -Duseshrplib -Dlibperl=libperl.so.5.28.1'
    hint=recommended
    useposix=true
    d_sigaction=define
    useithreads=define
    usemultiplicity=define
    use64bitint=define
    use64bitall=define
    uselongdouble=undef
    usemymalloc=n
    default_inc_excludes_dot=define
    bincompat5005=undef
  Compiler:
    cc='x86_64-linux-gnu-gcc'
    ccflags ='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64'
    optimize='-O2 -g'
    cppflags='-D_REENTRANT -D_GNU_SOURCE -DDEBIAN -fwrapv -fno-strict-aliasing -pipe -I/usr/local/include'
    ccversion=''
    gccversion='8.3.0'
    gccosandvers=''
    intsize=4
    longsize=8
    ptrsize=8
    doublesize=8
    byteorder=12345678
    doublekind=3
    d_longlong=define
    longlongsize=8
    d_longdbl=define
    longdblsize=16
    longdblkind=3
    ivtype='long'
    ivsize=8
    nvtype='double'
    nvsize=8
    Off_t='off_t'
    lseeksize=8
    alignbytes=8
    prototype=define
  Linker and Libraries:
    ld='x86_64-linux-gnu-gcc'
    ldflags =' -fstack-protector-strong -L/usr/local/lib'
    libpth=/usr/local/lib /usr/lib/gcc/x86_64-linux-gnu/8/include-fixed /usr/include/x86_64-linux-gnu /usr/lib /lib/x86_64-linux-gnu /lib/../lib /usr/lib/x86_64-linux-gnu /usr/lib/../lib /lib
    libs=-lgdbm -lgdbm_compat -ldb -ldl -lm -lpthread -lc -lcrypt
    perllibs=-ldl -lm -lpthread -lc -lcrypt
    libc=libc-2.28.so
    so=so
    useshrplib=true
    libperl=libperl.so.5.28
    gnulibc_version='2.28'
  Dynamic Linking:
    dlsrc=dl_dlopen.xs
    dlext=so
    d_dlsymun=undef
    ccdlflags='-Wl,-E'
    cccdlflags='-fPIC'
    lddlflags='-shared -L/usr/local/lib -fstack-protector-strong'


Characteristics of this binary (from libperl): 
  Compile-time options:
    HAS_TIMES
    MULTIPLICITY
    PERLIO_LAYERS
    PERL_COPY_ON_WRITE
    PERL_DONT_CREATE_GVSV
    PERL_IMPLICIT_CONTEXT
    PERL_MALLOC_WRAP
    PERL_OP_PARENT
    PERL_PRESERVE_IVUV
    USE_64_BIT_ALL
    USE_64_BIT_INT
    USE_ITHREADS
    USE_LARGE_FILES
    USE_LOCALE
    USE_LOCALE_COLLATE
    USE_LOCALE_CTYPE
    USE_LOCALE_NUMERIC
    USE_LOCALE_TIME
    USE_PERLIO
    USE_PERL_ATOF
    USE_REENTRANT_API
  Locally applied patches:
    DEBPKG:debian/cpan_definstalldirs - Provide a sensible INSTALLDIRS default for modules installed from CPAN.
    DEBPKG:debian/db_file_ver - https://bugs.debian.org/340047 Remove overly restrictive DB_File version check.
    DEBPKG:debian/doc_info - Replace generic man(1) instructions with Debian-specific information.
    DEBPKG:debian/enc2xs_inc - https://bugs.debian.org/290336 Tweak enc2xs to follow symlinks and ignore missing @INC directories.
    DEBPKG:debian/errno_ver - https://bugs.debian.org/343351 Remove Errno version check due to upgrade problems with long-running processes.
    DEBPKG:debian/libperl_embed_doc - https://bugs.debian.org/186778 Note that libperl-dev package is required for embedded linking
    DEBPKG:fixes/respect_umask - Respect umask during installation
    DEBPKG:debian/writable_site_dirs - Set umask approproately for site install directories
    DEBPKG:debian/extutils_set_libperl_path - EU:MM: set location of libperl.a under /usr/lib
    DEBPKG:debian/no_packlist_perllocal - Don't install .packlist or perllocal.pod for perl or vendor
    DEBPKG:debian/fakeroot - Postpone LD_LIBRARY_PATH evaluation to the binary targets.
    DEBPKG:debian/instmodsh_doc - Debian policy doesn't install .packlist files for core or vendor.
    DEBPKG:debian/ld_run_path - Remove standard libs from LD_RUN_PATH as per Debian policy.
    DEBPKG:debian/libnet_config_path - Set location of libnet.cfg to /etc/perl/Net as /usr may not be writable.
    DEBPKG:debian/perlivp - https://bugs.debian.org/510895 Make perlivp skip include directories in /usr/local
    DEBPKG:debian/squelch-locale-warnings - https://bugs.debian.org/508764 Squelch locale warnings in Debian package maintainer scripts
    DEBPKG:debian/patchlevel - https://bugs.debian.org/567489 List packaged patches for 5.28.1-6+deb10u1 in patchlevel.h
    DEBPKG:fixes/document_makemaker_ccflags - https://bugs.debian.org/628522 [rt.cpan.org #68613] Document that CCFLAGS should include $Config{ccflags}
    DEBPKG:debian/find_html2text - https://bugs.debian.org/640479 Configure CPAN::Distribution with correct name of html2text
    DEBPKG:debian/perl5db-x-terminal-emulator.patch - https://bugs.debian.org/668490 Invoke x-terminal-emulator rather than xterm in perl5db.pl
    DEBPKG:debian/cpan-missing-site-dirs - https://bugs.debian.org/688842 Fix CPAN::FirstTime defaults with nonexisting site dirs if a parent is writable
    DEBPKG:fixes/memoize_storable_nstore - [rt.cpan.org #77790] https://bugs.debian.org/587650 Memoize::Storable: respect 'nstore' option not respected
    DEBPKG:debian/makemaker-pasthru - https://bugs.debian.org/758471 Pass LD settings through to subdirectories
    DEBPKG:debian/makemaker-manext - https://bugs.debian.org/247370 Make EU::MakeMaker honour MANnEXT settings in generated manpage headers
    DEBPKG:debian/kfreebsd-softupdates - https://bugs.debian.org/796798 Work around Debian Bug#796798
    DEBPKG:fixes/autodie-scope - https://bugs.debian.org/798096 Fix a scoping issue with "no autodie" and the "system" sub
    DEBPKG:fixes/memoize-pod - [rt.cpan.org #89441] Fix POD errors in Memoize
    DEBPKG:debian/hurd-softupdates - https://bugs.debian.org/822735 Fix t/op/stat.t failures on hurd
    DEBPKG:fixes/math_complex_doc_great_circle - https://bugs.debian.org/697567 [rt.cpan.org #114104] Math::Trig: clarify definition of great_circle_midpoint
    DEBPKG:fixes/math_complex_doc_see_also - https://bugs.debian.org/697568 [rt.cpan.org #114105] Math::Trig: add missing SEE ALSO
    DEBPKG:fixes/math_complex_doc_angle_units - https://bugs.debian.org/731505 [rt.cpan.org #114106] Math::Trig: document angle units
    DEBPKG:fixes/cpan_web_link - https://bugs.debian.org/367291 CPAN: Add link to main CPAN web site
    DEBPKG:debian/hppa_op_optimize_workaround - https://bugs.debian.org/838613 Temporarily lower the optimization of op.c on hppa due to gcc-6 problems
    DEBPKG:debian/installman-utf8 - https://bugs.debian.org/840211 Generate man pages with UTF-8 characters
    DEBPKG:fixes/getopt-long-4 - https://bugs.debian.org/864544 [rt.cpan.org #122068] Fix issue #122068.
    DEBPKG:debian/hppa_opmini_optimize_workaround - https://bugs.debian.org/869122 Lower the optimization level of opmini.c on hppa
    DEBPKG:debian/sh4_op_optimize_workaround - https://bugs.debian.org/869373 Also lower the optimization level of op.c and opmini.c on sh4
    DEBPKG:debian/perldoc-pager - https://bugs.debian.org/870340 [rt.cpan.org #120229] Fix perldoc terminal escapes when sensible-pager is less
    DEBPKG:debian/prune_libs - https://bugs.debian.org/128355 Prune the list of libraries wanted to what we actually need.
    DEBPKG:debian/mod_paths - Tweak @INC ordering for Debian
    DEBPKG:debian/configure-regen - https://bugs.debian.org/762638 Regenerate Configure et al. after probe unit changes
    DEBPKG:debian/deprecate-with-apt - https://bugs.debian.org/747628 Point users to Debian packages of deprecated core modules
    DEBPKG:debian/disable-stack-check - https://bugs.debian.org/902779 [perl #133327] Disable debugperl stack extension checks for binary compatibility with perl
    DEBPKG:debian/gdbm-fatal - [perl #133295] https://bugs.debian.org/904005 Temporarily skip GDBM_File fatal.t for gdbm >= 1.15 compatibility
    DEBPKG:fixes/storable-recursion - https://bugs.debian.org/912900 [perl #133326] [120060c] (perl #133326) fix and clarify handling of recurs_sv.
    DEBPKG:fixes/caretx-fallback - https://bugs.debian.org/913347 [perl #133573] [03b94aa] RT#133573: $^X fallback when platform-specific technique fails
    DEBPKG:fixes/eumm-usrmerge - https://bugs.debian.org/913637 Avoid mangling /bin non-perl shebangs on merged-/usr systems
    DEBPKG:fixes/errno-include-path - [6c5080f] [perl #133662] https://bugs.debian.org/875921 Make Errno_pm.PL compatible with /usr/include/<ARCH>/errno.h
    DEBPKG:fixes/kfreebsd-renameat - [a3c63a9] https://bugs.debian.org/912521 [perl #133668] Also work around renameat() kernel bug on GNU/kFreeBSD
    DEBPKG:fixes/time-local-2020 - https://bugs.debian.org/915209 [rt.cpan.org #124787] Fix Time::Local tests
    DEBPKG:fixes/inplace-editing-bugfix/part1 - https://bugs.debian.org/914651 (perl #133659) move argvout cleanup to a new function
    DEBPKG:fixes/inplace-editing-bugfix/part2 - https://bugs.debian.org/914651 (perl #133659) tests for global destruction handling of inplace editing
    DEBPKG:fixes/inplace-editing-bugfix/part3 - https://bugs.debian.org/914651 (perl #133659) make an in-place edit successful if the exit status is zero
    DEBPKG:fixes/fix-manifest-failures - https://bugs.debian.org/914962 Fix t/porting/manifest.t failures when run in a foreign git checkout
    DEBPKG:fixes/pipe-open-bugfix/part1 - [perl #133726] https://bugs.debian.org/916313 Always mark pipe in pipe-open as inherit-on-exec
    DEBPKG:fixes/pipe-open-bugfix/part2 - [perl #133726] https://bugs.debian.org/916313 Always mark pipe in list pipe-open as inherit-on-exec
    DEBPKG:fixes/storable-probing/prereq1 - [3f4cad1] Storable: fix for strawberry build failures:
    DEBPKG:fixes/storable-probing/prereq2 - [perl #133411] [edf639f] (perl #133411) don't try to load Storable with -Dusecrosscompile
    DEBPKG:fixes/storable-probing/disable-probing - https://bugs.debian.org/914133 [perl #133708] [2a0bbd3] (perl #133708) remove build-time probing for stack limits for Storable
    DEBPKG:debian/perlbug-editor - https://bugs.debian.org/922609 Use "editor" as the default perlbug editor, as per Debian policy
    DEBPKG:fixes/posix-mbrlen - [25d7b7a] https://bugs.debian.org/924517 [perl #133928] Fix POSIX::mblen mbstate_t initialization on threaded perls with glibc
    DEBPKG:fixes/CVE-2020-10543 - https://bugs.debian.org/962005 regcomp.c: Prevent integer overflow from nested regex quantifiers.
    DEBPKG:fixes/CVE-2020-10878 - https://bugs.debian.org/962005 study_chunk: extract rck_elide_nothing
    DEBPKG:fixes/CVE-2020-12723 - https://bugs.debian.org/962005 study_chunk: avoid mutating regexp program within GOSUB
    DEBPKG:fixes/io-socket-ip-nov4 - https://bugs.debian.org/962019 Fix test failures in IO::Socket::IP with an IPv6-only host
  Built under linux
  Compiled at Jul 21 2020 19:27:00
  @INC:
    /etc/perl
    /usr/local/lib/x86_64-linux-gnu/perl/5.28.1
    /usr/local/share/perl/5.28.1
    /usr/lib/x86_64-linux-gnu/perl5/5.28
    /usr/share/perl5
    /usr/lib/x86_64-linux-gnu/perl/5.28
    /usr/share/perl/5.28
    /usr/local/lib/site_perl
    /usr/lib/x86_64-linux-gnu/perl-base

sblondeel avatar Oct 20 '22 10:10 sblondeel

I disagree. utf8::upgrade is a string function, so it's appropriate to coerce its argument into a string in order to operate on it, much like $foo++ would define an undef variable.

Grinnz avatar Oct 20 '22 10:10 Grinnz

If $foo is undef, $foo++ adds 1 to something unknown, which can be taken to be the neutral element of addition, in other words 0. This may not fly so well with multiplication...

In any case the execution warning should trigger the developer, the logs, the supervision, ...

If $foo is an undef string marking the end of a while read() loop from a file handle, utf8::upgrade() breaking this is a violation of POLS (Principle of least surprise) INMHO.

Reasoning: utf8::upgrade() acts on the internal representation of scalars in Perl (binary or utf-8). Something undef should not turn into an empty string, which are different concepts. Nothing internal to Perl should ever have external consequences (but I already reported a few bugs in modules regarding this, I am starting to suspect something more fundamentally anchored).

If a string is coded in binary and undef, why should its equivalent coded in utf-8 be def? If the open() had been done with some I/O layer like

              open(my $fh, "<:encoding(UTF-8)", $filename)

then the $line would have been undef.

sblondeel avatar Oct 20 '22 10:10 sblondeel

Regarding this I tried an experiment to see it utf8::upgrade and utf8::downgrade where "one-to-one-of-each-other-invert and projection functions" (see schema below) but Perl does not simplify combining accents:

use utf8;
use Data::Dumper;
my $vanilla_name   = "Sébastien";
my $combining_name = "Se\x{0301}bastien"; # U+0301 COMBINING ACUTE ACCENT

print "  vanilla_name: " . Dumper($vanilla_name);
print "combining_name: " . Dumper($combining_name);

utf8::downgrade($vanilla_name);
utf8::downgrade($combining_name);

does not work as I would have fantasized:

$ perl /tmp/toto.pl 
  vanilla_name: $VAR1 = "S\x{e9}bastien";
combining_name: $VAR1 = "Se\x{301}bastien";
Wide character in subroutine entry at /tmp/toto.pl line 12.

So in any case I expect utf8::upgrade and utf8::downgrade to work like this:


      [ BINARY INTERNAL ]  ====== utf8::upgrade =====> [ UTF-8 INTERNAL ]
      [ REPRESENTATION  ]  <===== utf8::downgrade ==== [ REPRESENTATION ]

          ||      ^^                                       ||       ^^
          ||      ||                                       ||       ||
         utf8::downgrade                                 utf8::upgrade
         is Identity function                            is Identity function

and this breaks for undef.

sblondeel avatar Oct 20 '22 10:10 sblondeel

On Thu, 20 Oct 2022 at 12:20, Dan Book @.***> wrote:

I disagree. utf8::upgrade is a string function, so it's appropriate to coerce its argument into a string in order to operate on it, much like $foo++ would define an undef variable.

length is a string function, and it does not coerce undef to the empty string, it returns undef.

Personally I am surprised by this, I would not have guessed that utf8::upgrade() would convert its argument to a string, including refs. I guess it makes sense from an internals point of view. But it strikes me as quite odd from the perl level, i would expect the XS glue to check the var is defined and not a ref at the least.

The docs do not explicitly state this happens, they refer to strings only:

    (Since Perl v5.8.0) Converts in-place the internal representation of
    the string from an octet sequence in the native encoding (Latin-1 or
    EBCDIC) to UTF-8. The logical character sequence itself is
    unchanged. If *$string* is already upgraded, then this is a no-op.
    Returns the number of octets necessary to represent the string as
    UTF-8.

I would consider it pretty reasonable to see 'if $string is a reference or undefined then this is a no-op".

Seems like a bug to me.

cheers, Yves

-- perl -Mre=debug -e "/just|another|perl|hacker/"

demerphq avatar Oct 22 '22 09:10 demerphq

On Thu, 20 Oct 2022 at 12:20, Dan Book @.***> wrote: I disagree. utf8::upgrade is a string function, so it's appropriate to coerce its argument into a string in order to operate on it, much like $foo++ would define an undef variable. length is a string function, and it does not coerce undef to the empty string, it returns undef.

length is an exception, and also does not operate in place. All other string functions treat their argument as empty string if operating on undef (and warn about it)

Grinnz avatar Oct 22 '22 09:10 Grinnz

On Sat, 22 Oct 2022 at 11:53, Dan Book @.***> wrote:

On Thu, 20 Oct 2022 at 12:20, Dan Book @.***> wrote: I disagree. utf8::upgrade is a string function, so it's appropriate to coerce its argument into a string in order to operate on it, much like $foo++ would define an undef variable. length is a string function, and it does not coerce undef to the empty string, it returns undef.

length is an exception, and also does not operate in place. All other string functions treat their argument as empty string if operating on undef (and warn about it)

Well, as I said /personally/ I would consider this a bug, but if we are not going to consider it a bug we should document it as I think many people would find the behavior strange.

Yves

perl -Mre=debug -e "/just|another|perl|hacker/"

demerphq avatar Oct 22 '22 10:10 demerphq

On Sat, Oct 22, 2022 at 03:02:18AM -0700, Yves Orton wrote:

Well, as I said /personally/ I would consider this a bug, but if we are not going to consider it a bug we should document it as I think many people would find the behavior strange.

It's behaviour is certainly inconsistent with other common "modify in place" functions and length():

sub d { printf "%9s %s\n",  defined $_[0] ? "defined" : "undefined", $_[1] }

$y1 = length    $x1; d($x1, "length");
chomp           $x2; d($x2, "chomp");
chop            $x3; d($x3, "chop");
utf8::upgrade   $x4; d($x4, "utf8::upgrade");
utf8::downgrade $x4; d($x5, "utf8::downgrade");

outputs:

undefined length
undefined chomp
undefined chop
  defined utf8::upgrade
undefined utf8::downgrade

-- All wight. I will give you one more chance. This time, I want to hear no Wubens. No Weginalds. No Wudolf the wed-nosed weindeers. -- Life of Brian

iabyn avatar Oct 28 '22 11:10 iabyn

I submitted a PR to fix this for comments. Note I don't really know what I'm doing with this area of the code.

https://github.com/Perl/perl5/pull/20451

khwilliamson avatar Oct 28 '22 15:10 khwilliamson

Not that this shouldn’t change, but @sblondeel, did you report the other bugs/issues you’ve found?

In my own experience, utf8::upgrade is useful in testing, but generally some encode/decode combination leads to a happier place. FWIW.

FGasper avatar Oct 31 '22 14:10 FGasper

Hi,

I already reported https://github.com/shlomif/perl-XML-LibXML/issues/72 with no reaction so far.

Motivated by your interest, I tried to remember/reproduce other such problems I had met.

Either Excel::Writer::XLSX got corrected or the bug occurred with XLS?

Anyway the following still looks buggy to me:

#! /usr/bin/perl
use warnings;
use strict;
use utf8;
use feature 'say';
use URI;

my $u = URI->new("http://www.perl.com");

my $label = "Sébastien";
utf8::upgrade($label); # useless, to make it clearer

$u->query_keywords($label);
say " UTF-8: " . $u->as_string;

utf8::downgrade($label);
$u->query_keywords($label);
say "binary: " . $u->as_string;

Output:

 UTF-8: http://www.perl.com?S%C3%A9bastien
binary: http://www.perl.com?S%E9bastien

Apparently the URI module infers the encoding to use to URL-encode non-ASCII characters from the internal representation in Perl, which should never have external consequences.

Reading https://en.wikipedia.org/wiki/Percent-encoding leads me to believe the chosen encoding should always be UTF-8.

But what is worrisome is that something internal to Perl can be shown outside of it, which lead me to think something is broken in the core of Perl.

Oh no, the author was aware of this, look: /usr/share/perl5/URI/Escape.pm

# XXX FIXME escape_char is buggy as it assigns meaning to the string's storage format.
sub escape_char {
    # Old versions of utf8::is_utf8() didn't properly handle magical vars (e.g. $1).
    # The following forces a fetch to occur beforehand.
    my $dummy = substr($_[0], 0, 0);

    if (utf8::is_utf8($_[0])) {
        my $s = shift;
        utf8::encode($s);
        unshift(@_, $s);
    }

    return join '', @URI::Escape::escapes{split //, $_[0]};
}

...which leads me to believe the utf8::is_utf8 function should never be allowed in Perl modules (with the exception of Data::Dumper maybe).

Regards,

sblondeel avatar Dec 07 '22 11:12 sblondeel

@sblondeel Indeed, lots of XS modules—and even Perl built-ins—expose Perl strings’ internal representation to the outside world. The problem is that fixing these cases may break existing applications.

This was actually the topic of my presentation at the last Perl/Raku conference: https://www.youtube.com/watch?v=yH5IyYyvWHU

The solution I’ve wondered about is to repurpose 2 bits from the refcount in order to store string state:

  • 0: unknown
  • 1: byte string
  • 2: text/Unicode string

… but I’ve not gotten much further than that.

FGasper avatar Dec 07 '22 14:12 FGasper