squirrel icon indicating copy to clipboard operation
squirrel copied to clipboard

Floatingpoint numeric parsing is broken if the locale is not english

Open bluehazzard opened this issue 8 years ago • 13 comments

Hi, on Linux if I use a locale with ',' as decimal point separator the parsing of the floating points in non string expressions is broken, because _fvalue = (SQFloat)scstrtod(&_longstr[0],&sTemp); returns a wrong value.

For example: The squirrel code printf("%f",12.34) will output: 12 scstrtod returns 12 because it expects a ',' as decimal separator...

A solution would be to manually parse the floating point number if it is not in a string literal...

bluehazzard avatar May 03 '16 10:05 bluehazzard

Looks like what lua might do is:

#include <locale.h>
case TFLOAT:
    for(int i=0;i<_longstr.size();i++)
        if(_longstr[i] == '.')
            _longstr[i] = localeconv()->decimal_point[0];
    _fvalue = (SQFloat)scstrtod(&_longstr[0],&sTemp);

Another idea would be to wrap scstrtod with a function that is capable of handling this (in this same way generally), only on platforms where it's needed. Then on some other platforms you could pass the invariant locale into a proprietary variant function, if it exists.

is this a problem in sqbaselib's str2num() as well?

zeromus avatar May 03 '16 18:05 zeromus

_longstr[i] = localeconv()->decimal_point[0];

i don't think this is a good idea, because for example in arabic the decimal point is (or can be, depending on the code page) a multi byte symbol [1]. But i am not sure if this is a problem...

some other thoughts from the luajit author:

[...] ** Rationale for the builtin string to number conversion library: ** ** It removes a dependency on libc's strtod(), which is a true portability ** nightmare. Mainly due to the plethora of supported OS and toolchain ** combinations. Sadly, the various implementations ** a) are often buggy, incomplete (no hex floats) and/or imprecise, ** b) sometimes crash or hang on certain inputs, ** c) return non-standard NaNs that need to be filtered out, and ** d) fail if the locale-specific decimal separator is not a dot, ** which can only be fixed with atrocious workarounds. ** ** Also, most of the strtod() implementations are hopelessly bloated, ** which is not just an I-cache hog, but a problem for static linkage ** on embedded systems, too. [,,,]

This is quoted from his own implementation of strtod found in the lj_strscan.c from his git repo (http://luajit.org/download.html)

is this a problem in sqbaselib's str2num() as well?

to make things more clear: this line _fvalue = (SQFloat)scstrtod(&_longstr[0],&sTemp); is from sqlexer.cpp line 460 so it is a compiler/lexer problem not a implementation problem of functions for example print(format("%f",12.34)) returns 12.0000000

print("12.34".tofloat()) returns 12

interestingly i can't reproduce this errors with the binary provided with squirrel. But it should not work, as also the lua implementation has a workaround. There seems to be a mess up with the locale settings (one point for the luajit implementation)...

i will investigate future

[1] https://en.wikipedia.org/wiki/Decimal_mark

greetings

bluehazzard avatar May 03 '16 21:05 bluehazzard

I know now why the provided binary of squirrel ("sq") works. The internal locale is "C". I don't know why it is using this locale, because LANG is set to "de_DE.UTF-8"

anyway it is a bad bug in the squirrel lexer...

bluehazzard avatar May 07 '16 14:05 bluehazzard

I think it isn't proper to have a lexer depending on someone else's flaky numerical lexer. If luajit has a better function, we should use that. I propose this rule: whatever a prominent and well-tested lua does for parsing numbers is the best choice for squirrel. Can you think of any exceptions to this rule? I think I will try to integrate luajit's strtod replacement within a few days if I don't hear anything else.

zeromus avatar May 07 '16 14:05 zeromus

I agree with you .

mingodad avatar May 11 '16 08:05 mingodad

I think luajit strtod is unnecessarily complicated, I'd rather find something simpler or write one.

albertodemichelis avatar May 11 '16 10:05 albertodemichelis

Check this out

https://github.com/mono/mono/blob/master/mcs/class/referencesource/mscorlib/system/number.cs ParseNumber().

https://github.com/mono/mono/blob/master/mono/metadata/number-ms.c mono_double_from_number() and number_to_double()

(all MIT licensed)

You can strip some options out of the c# code and it ends up being pretty compact, IMO. Then if theres any bugs in a new implementation for squirrel, we can check it against .net as a reference.

zeromus avatar Jun 19 '16 23:06 zeromus

Was this ever fixed? I don't seem to have this issue right now but as a person living in a decimal-comma-country I've suffered the pain before.

Here is a commit in another open source project where they fixed the same issue: https://github.com/imageworks/OpenShadingLanguage/commit/0f339174f24f4766b869dc30b24bfb83fa6a5dfd https://groups.google.com/forum/#!topic/osl-dev/OB4N4Rpdobs

breakin avatar Nov 16 '17 21:11 breakin

This was never fixed. I still suggest we use one of the two I suggested. I'll volunteer to do the work if @albertodemichelis will okay the concept

zeromus avatar Nov 16 '17 21:11 zeromus

I written a float parser for this, I just need to find it :) and test it.

albertodemichelis avatar Nov 17 '17 05:11 albertodemichelis

I've put together this a while ago.

#include <math.h>
#include <assert.h>
double parsefloat(const char *s)
{
	double sign = 1;
    double res = 0;
    double scale = 0.0;
    const char *x = s;
    char c;
	if (*x == '-')
	{
		sign = -1;
		x++;
	}
    while (c = *x) { //until exponent
        if (c >= '0' && c <= '9')
        {
            res = res * 10 + (c - '0');
            scale *= 10;
        }
        else if(c == '.') {
            scale = 1;
        }
        else {
            break;
        }
        x++;
    }
    bool hasexp = false;
    double esign = 1;
    double exp = 0.0;
    int nexp = 0; //number of digits in exp
    if (c == 'e' || c == 'E')
    {
        x++;
        c = *x;
        hasexp = true;

        if (c == '+')
        {
            esign = 1;
            x++;
        }
        else if (c == '-')
        {
            esign = -1;
            x++;
        }

        while (c = *x) { //until end
            if (c >= '0' && c <= '9')
            {
                exp = exp * 10 + (c - '0');
                nexp++;
            }
            else {
                break;
            }
            x++;
        }
    }
    
    if (scale != 0) {
        res *= 1.0 / scale;
    }

    if (hasexp)
    {
        if (nexp == 0) {
            //error;
        }
        exp *= esign;
        res = res * pow(10, exp);
    }
	res *= sign;
    return res;
}

#define TEST_PARSE_DOUBLE(val) {assert(parsefloat(#val) == val);}
int main(int argc, char* argv[])
{
	TEST_PARSE_DOUBLE(0.5);
	TEST_PARSE_DOUBLE(9.99);
	TEST_PARSE_DOUBLE(12);
	TEST_PARSE_DOUBLE(12.0);
	TEST_PARSE_DOUBLE(12.0e+10);
	TEST_PARSE_DOUBLE(-122131242342.0e-10);
	TEST_PARSE_DOUBLE(.5);
	TEST_PARSE_DOUBLE(122138760e-2);
	return 0;
}

albertodemichelis avatar Nov 17 '17 23:11 albertodemichelis

Hi, I also encountered this problem and this solved it.

#include <locale>
#include <clocale>

struct DecimalFixer {
	DecimalFixer() {
		std::locale::global(std::locale(std::locale(), new fixed_decimal_point));
		setlocale(LC_NUMERIC, "C");
	}
private:
	struct fixed_decimal_point : public std::numpunct<char> {
		char_type do_decimal_point() const { return '.'; }
	};
};

static DecimalFixer  fixer; // Call this smth in lib code

MrBrrown avatar Aug 27 '24 13:08 MrBrrown

No, we need to integrate the handmade parser or find another. Depending on libc/c++ locale is not portable.

zeromus avatar Aug 27 '24 20:08 zeromus