perl5 icon indicating copy to clipboard operation
perl5 copied to clipboard

SvUV() macro 100% of time calls Perl_sv_2uv_flags() because SvUOK_nog/Perl_sv_setuv/SVf_IVisUV

Open bulk88 opened this issue 1 year ago • 4 comments

Description I was stepping debugging some XS code, and noticed, if you create a SvUV, with sv_setuv(), then later use SvUV() macro or eqv inline function, %99.999 of the time, Perl_sv_2uv_flags() will execute.

This "SvUV" created with sv_setuv(), or newSVuv()

SV = IV(0x6d2de0) at 0x6d2df0
  REFCNT = 2
  FLAGS = (IOK,pIOK)
  IV = 5369402920

the IV is really a C pointer/opaque pointer from a C library. 100% of the time, SvUV() macro, will execute Perl_sv_2uv_flags(). Because of

SV *
Perl_newSVuv(pTHX_ const UV u)
{
    SV *sv;

    /* Inlining ONLY the small relevant subset of sv_setuv here
     * for performance. Makes a significant difference. */

    /* Using ivs is more efficient than using uvs - see sv_setuv */
    if (u <= (UV)IV_MAX) {
        return newSViv((IV)u);
    }

    new_SV(sv);

    /* We're starting from SVt_FIRST, so provided that's
     * actual 0, we don't have to unset any SV type flags
     * to promote to SVt_IV. */
    STATIC_ASSERT_STMT(SVt_FIRST == 0);

    SET_SVANY_FOR_BODYLESS_IV(sv);
    SvFLAGS(sv) |= SVt_IV;
    (void)SvIOK_on(sv);
    (void)SvIsUV_on(sv);

    SvUV_set(sv, u);
    SvTAINT(sv);

    return sv;
}

that

    if (u <= (UV)IV_MAX) {

turns off SVf_IVisUV in final SvUV, then the SvUOK_nog() in SvUV(), says "NOT AN INTEGER" and calls the HEAVY getter Perl_sv_2uv_flags() each time. Since these are pointers, I think on 99% OSes, highest bit in an address, or negative addresses, are kernel space, and prohibited for user mode. Win32 32b and maybe some Solaris'es use and alloc high bit/negative pointer addresses, but I dont think P5 is compatible with such CPUs/C compilers/build configs.

I traced

+#define SvUOK_nog(sv)		((SvFLAGS(sv) & (SVf_IOK|SVf_IVisUV|SVs_GMG)) == (SVf_IOK|SVf_IVisUV))
@@ -1568,9 +1587,9 @@ Like sv_utf8_upgrade, but doesn't do magic on C<sv>.
 */
 
 /* Let us hope that bitmaps for UV and IV are the same */
-#define SvIV(sv) (SvIOK(sv) ? SvIVX(sv) : sv_2iv(sv))
-#define SvUV(sv) (SvIOK(sv) ? SvUVX(sv) : sv_2uv(sv))
-#define SvNV(sv) (SvNOK(sv) ? SvNVX(sv) : sv_2nv(sv))
+#define SvIV(sv) (SvIOK_nog(sv) ? SvIVX(sv) : sv_2iv(sv))
+#define SvUV(sv) (SvUOK_nog(sv) ? SvUVX(sv) : sv_2uv(sv))
+#define SvNV(sv) (SvNOK_nog(sv) ? SvNVX(sv) : sv_2nv(sv))

to

Revision: 4bac9ae47b5ad7845a24e26b0e95609805de688a Author: Chip Salzenberg [email protected] Date: 6/22/2012 6:18:18 PM Message: Magic flags harmonization.

    /* Using ivs is more efficient than using uvs - see sv_setuv */
    if (u <= (UV)IV_MAX) {
        return newSViv((IV)u);
    }

traced to

Revision: 4d55d9ac67bf8bc58f3fc9563b082459c6a3c22b Author: Steffen Müller [email protected] Date: 11/28/2014 11:34:00 AM Message: Repeat newSViv optimization for newSVuv

and that quotes

Revision: bd30fe8921c88e4677c2279b442a56a11ae037b4 Author: Eric Herman [email protected] Date: 11/28/2014 9:08:04 AM Message: Speed up newSViv()

see also

Revision: 013abb9bfbbb8091fa79597e81e7208ef1fe2dde Author: Nicholas Clark [email protected] Date: 2/1/2012 4:58:14 PM Message: Update, correct and clarify the comment in Perl_sv_setuv().

See the correspondence on ticket #36459 for more details.

AKA https://github.com/Perl/perl5/issues/8002

So questions, why is #define SvUOK_nog(sv) ((SvFLAGS(sv) & (SVf_IOK|SVf_IVisUV|SVs_GMG)) == (SVf_IOK|SVf_IVisUV)) used? isnt the byte size and bitfields for IV and UV 100% identical all CPUs OSes?

Doesnt SVf_IVisUV only matter for NV FP double conversion and stringify?

Does SV MAGIC somehow pass "want UV" vs "want IV" flags to the CPAN XS vtable or CPAN PP tied SCALARs?

Do we throw truncate/overflow console warnings for SvIV ptrs (SVf_IOK yes SVf_IVisUV no), passed to SvUV() getter macro, with the SVIV being "-1"? I can't find such code (XSUB level, not PP ops) in sv.c, but I will ask anyways.

This is bad enough a fix should go into maint releases.

PS, the problem in found in blead Perl 5 core XS and PP and runtime code. Because SvUV() is being used in some places to fetch pointer type integers.

Dirty fix, CPAN author side, always use setiv() SvIV() SvIVX() or ONLY use SvUVX() in XS code to store/get/read/set pointers and pointer like data, hell, just NEVER use SvUV() macro, since ONLY "(UV)-1" and "(UV)-123456" ever use the "fast path" in SvUV().

There needs to be a P5P fix for this. Vs manual hunting and patching of CPAN XS and core.

Steps to Reproduce

sv_setuv(); sv_dump(); SvUV(); and use a C step debugger inside SvUV(); statement

Expected behavior

Do not do a func call inside SvUV(); for SVIV flagged SV heads.

Perl configuration NA, blead perl 5.41 and many other versions

bulk88 avatar Oct 10 '24 07:10 bulk88

How much does:

    /* Using ivs is more efficient than using uvs - see sv_setuv */
    if (u <= (UV)IV_MAX) {
        return newSViv((IV)u);
    }

actually save?

Even though 4d55d9ac67bf8bc58f3fc9563b082459c6a3c22b states that this is "Pretty much the same change as bd30fe8921c88e4677c2279b442a56a11ae037b4", but that particular short-cut is in addition to that.

I wonder if it would be better to simply remove that.

ericherman avatar Oct 10 '24 12:10 ericherman

I strongly suspect @tsee tested this with a micro-benchmark, but I do not know what benchmark was used.

ericherman avatar Oct 10 '24 12:10 ericherman

I strongly suspect @tsee tested this with a micro-benchmark, but I do not know what benchmark was used.

Without researching, (correct this if I am wrong) I think pp_add() the pure perl op codes, always do IV math first, then upgrade to UV math on overflow, then upgrade to NV math on overflow, plus negative numbers are popular jkjk. If pp_math() ops are IV priority first, UV math will definitely be slower if that is the 2nd or 3rd branch/test jump in the C code.

Perhaps some CPUs, gaming/assembly/big data back then, signed ints were some nanoseconds faster than unsigned ints, or zero extend vs sign extend was slightly different speeds on CPUs back then.

Or Steffen??? has a biz algo, where negative numbers -2billion to 0 to +2billion, kept his code in IV integer mode, while UV's constantly overflowed and degraded to slower NV/FPUs. But that is very problem specific to one user.

I want to just change the SvUV() macro, so SVIV no UV flag, take fast path (no fnc call), through SvUV() macro, but the git history is so complicated, and changed in the 2010s, it needs the original authors or chip in, or someone who remembers from back then.

I vaguely remember Father C 10 years ago, being very active in poking and writing fixes for the SV XS magic system, since he found and claimed and probably did fix many cases of "data loss" or data degradation (IV/UV/NV/PV casting IDK where, that destroys round trip ability) through the perlapi setters/getters. Including some kind of flaws with private IOK NOK POK vs public IOK NOK POK.

https://github.com/Perl/perl5/commit/4bac9ae47b5ad7845a24e26b0e95609805de688a

Revolves around data loss inside Perl Magic API, and that is the specific commit that caused the damage (broke the fast path in SvUV).

A trolling argument I have is, is its "required" to call sv_2uv() fn call on an IV, incase the IV is negative, and that is a programmer error, or security exploit, and P5 must process abort or throw large console messages?

I'd really like to see this fixed, since there is tons of CPAN code storing pointers in sv_setuv()/SvUV(), that is suffering.

Perhaps, if nobody has technical input, it is just best to be adventurous and change SvUV() and see how much of core breaks, and see if CPAN complaints roll in, or its crickets from CPAN, which means the SvUV change was a success.

I'd like a 2nd opinion, since git blame commits, point to repairing "data loss" 10 years ago, and a slower thought process is needed.

bulk88 avatar Oct 13 '24 12:10 bulk88

On Sun, Oct 13, 2024 at 05:48:12AM -0700, bulk88 wrote:

Without researching, (correct this if I am wrong) I think pp_add() the pure perl op codes, always do IV math first, then upgrade to UV math on overflow, then upgrade to NV math on overflow,

Yes, it was effectively a way to keep an SV as an integer (now unsigned) for values between IV_MAX and UV_MAX, rather than upgrading to NV.

The number-handling code in the core is likely to be optimised around this assumption - i.e. it wasn't envisioned that the UV flag would typically be set for values below IV_MAX.

-- Technology is dominated by two types of people: those who understand what they do not manage, and those who manage what they do not understand.

iabyn avatar Oct 14 '24 13:10 iabyn

Win32 32b and maybe some Solaris'es use and alloc high bit/negative pointer addresses, but I dont think P5 is compatible with such CPUs/C compilers/build configs.

We had at least one bug #12993 report where a 32-bit Linux system was allocating memory above the 2G mark in a child thread.

tonycoz avatar Oct 21 '24 00:10 tonycoz

We had at least one bug #12993 report where a 32-bit Linux system was allocating memory above the 2G mark in a child thread.

So are negative pointers or 1.3 GB long scalars on 32b perl officially supported by P5P or not? I remember sisyphus and rurban some years ago both claim /3GB mode and Win32 32b Perl had instant SEGVs on start up and it was unfixable. I asked did they try to fix the SEGVs, they said yes, but abandoned it after the 3 or 5th SEGV fix they did and then deleted their private branch. I've never compiled /3GB mode myself.

bulk88 avatar Oct 22 '24 08:10 bulk88

So are negative pointers or 1.3 GB long scalars on 32b perl officially supported by P5P or not?

As far as I know they are supported. I'd certainly try to fix such bugs if I saw them and had the attention left to do so, as I fixed #12993.

tonycoz avatar Oct 28 '24 04:10 tonycoz

I'd really like to see this fixed, since there is tons of CPAN code storing pointers in sv_setuv()/SvUV(), that is suffering.

Perhaps, if nobody has technical input, it is just best to be adventurous and change SvUV() and see how much of core breaks, and see if CPAN complaints roll in, or its crickets from CPAN, which means the SvUV change was a success.

It's been 3 weeks; I don't think we're likely to see more technical input, and this strikes me as a pragmatic, if "adventurous", position.

ericherman avatar Oct 31 '24 10:10 ericherman