original-mawk icon indicating copy to clipboard operation
original-mawk copied to clipboard

Integer array index improvement

Open jlp765 opened this issue 2 years ago • 3 comments

Currently gawk (v5.0.1) is about twice as quick as mawk (v1.3.4) for mawk 'BEGIN{ while(i<1000000){ x[""i]=""i++;} print(x["0"])}'

When forced to use integer indexes, mawk is faster (or comparable to) gawk: mawk 'BEGIN{i=0; while(i<1000000){ x[i]=i++;} print(x[0])}'

This first example is doing two "wrong" things:

  • variable i is not being initialised as an integer, so initially is defaulting to an uninitialized type (internally calls the slower find_by_sval() function)than
  • indexes of variable x are being forced to be strings which makes it slow (internally calls the slower find_by_sval() function)

The code can be improved to be consistently faster by modifying the default case option of array_find() to identify "string" indexes that are actually integer values, and call find_by_ival() rather than find_by_sval()

e.g. (array.c)

    case C_NOINIT:
        ap = find_by_sval(A, &null_str, create_flag, &redid);
        break;
    default:
        {
            double d = strtod(string(cp)->str, (char **) 0);
            Int ival = d_to_I(d);

            if ((double) ival == d) {
                if (A->type == AY_SPLIT) {
                    if (ival >= 1 && ival <= (int) A->size)
                        return (CELL *) A->ptr + (ival - 1);
                    if (!create_flag)
                        return (CELL *) 0;
                    convert_split_array_to_table(A);
                } else if (A->type == AY_NULL)
                    make_empty_table(A, AY_INT);
                ap = find_by_ival(A, ival, create_flag, &redid);
            } else
                ap = find_by_sval(A, string(cp), create_flag, &redid);
        }
        break;

For minimal testing it works OK, but maybe there exists

            double d = strtod(string(cp)->str, (char **) 0);
            Int ival = d_to_I(d);

where the cell string value of cp converts to a zero value d, then it would result in incorrectly treating a string index as an integer index.

Anyway, hope this helps.

jlp765 avatar Dec 28 '23 04:12 jlp765

thanks (I'll check on this when I get back to mawk - currently on xterm...)

ThomasDickey avatar Dec 28 '23 22:12 ThomasDickey

In a quick check, the suggested change results in some test-failures. I'll come back to this when I can spend a day or two (at the moment am just working on simple changes for a maintenance release).

ThomasDickey avatar Jan 23 '24 23:01 ThomasDickey