nodemcu-firmware icon indicating copy to clipboard operation
nodemcu-firmware copied to clipboard

Review all uses of lua numbers to determine if they ought to be integers

Open pjsg opened this issue 4 years ago • 13 comments

As Philip has correctly pointed out, we have a soft break introduced in Lua 5.3 in the we've moved our default build to sint32 / float32 sub types so that a TValue can fit into 8 rather than 12 bytes. The standard Lua C API has a set of calls relating to number and integer handling that are largely preserved.

A lot of our modules adopt rather sloppy use of the API for example int val = lua_tonumber(L,1). In Lua 5.1 the API call returns a double which is then converted inline to a type int or unsigned int. This is lossless if the underlying Lua value contained this type.

  • This is no longer the case with Lua 5.3 which returns a type float that can only store 23 bit integers without loss of precision.
  • Given that the VM manipulates integer quantities using 32 integers, it makes sense to avoid the unnecessary int -> float-> int double conversion.

This issue largely subsumes the sub issues: #3213, #3214, #3215, #3216, #3217 and #3218, I suggest that we handle general number type issues here. See my further analysis below.


[Philip's original lede] It turns out that it isn't just lua_pushnumber, but the reading functions can have problems as well.

rtcmem.write32 has this problem (from code inspection). I think that the json decoder will have problems....

pjsg avatar Jul 17 '20 02:07 pjsg

Is this behavior because of same reasons?

Example code:

data = {
mixed_data = node.chipid()..'/data',
mixed_status = node.chipid()..'/status',
integer = 500,
integer2 = 1000,
boolean_f = false,
boolean_t = true,
}

print('Integer: '..tostring(data.integer)..' boolean: '..tostring(data.boolean_f))

sjson.encode(data) 
ok, json = pcall(sjson.encode, data)
json_obj = sjson.decode(json)
print(type(json_obj))
for k, v in pairs(json_obj) do     print(k..' = '..tostring(v)..',') end

Result:


Integer: 500 boolean: false

table

boolean_t = true,
integer2 = 1000.0,
boolean_f = false,
mixed_data = 1111111/data,
integer = 500.0,
mixed_status = 1111111/status,

I'm using tostring because of mixed table, as on example I've provided. It could be seen, that before conversion tostring behaves as expected, but after sjson I'm getting .0. Also, result came in different order, possibly using sjson wrongly.

Edit: forgot to write that it's on dev 5.3

KT819GM avatar Jul 18 '20 11:07 KT819GM

sjson is definitely a module that needs some work to get the integer/float handling correct. I'm going to be looking at it over the weekend.

pjsg avatar Jul 18 '20 13:07 pjsg

@KT819GM Take a look at https://github.com/nodemcu/nodemcu-firmware/pull/3222 -- this should fix your problem.

pjsg avatar Jul 18 '20 21:07 pjsg

@KT819GM Take a look at #3222 -- this should fix your problem.

Yes, now numbers are represented correctly. Not sure why string order is changing, not a problem for me (for config file) but possibly could be issue for someone who will be parsing json string with fixed position (like parsing object in Node-Red and adding values to array renaming them). Thank you

KT819GM avatar Jul 19 '20 13:07 KT819GM

The order of elements in a Json object are not defined. Some systems have ways of forcing alphabetical ordering, but not nodemcu.

On Sun, Jul 19, 2020, 09:51 Modestas Bunokas [email protected] wrote:

@KT819GM https://github.com/KT819GM Take a look at #3222 https://github.com/nodemcu/nodemcu-firmware/pull/3222 -- this should fix your problem.

Yes, now numbers are represented correctly. Not sure why string order is changing, not a problem for me (for config file) but possibly could be issue for someone who will be parsing json string with fixed position (like parsing object in Node-Red and adding values to array renaming them). Thank you

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/nodemcu/nodemcu-firmware/issues/3219#issuecomment-660646555, or unsubscribe https://github.com/notifications/unsubscribe-auth/AALQLTMFUWDN7QV4I2T5DLLR4L25PANCNFSM4O5KW77A .

pjsg avatar Jul 19 '20 14:07 pjsg

Couldn't be more clear. Thank you

KT819GM avatar Jul 19 '20 15:07 KT819GM

The C API defines the following number and integer functions. The former use type LUA_NUMBER and the latter LUA_INT for numeric arguments and returns. These are double and int on 5.1 and float and int on 5.3, respectively. The declarative types are lua_Number and lua_Integer.

Number form Integer Form Comments
lua_isnumber(L,n) lua_isinteger(L,n)
lua_pushnumber(L,i) lua_pushinteger(L,i)
lua_tonumber(L,n) lua_tointeger(L,n)
luaL_checknumber(L,n) luaL_checkinteger(L,n)
luaL_optnumber(L,n,def) luaL_optinteger(L,n,def)
lua_stringtonumber(L,n,s) converts to number/int

Note that:

  • All except the last are common to both APIs; lua_stringtonumber is currently only supported on Lua 5.3, but I will add this to the Lua 5.1 API, so that we can use this in our modules without needed #if code.
  • The latest GCC compiler will generate precision warnings if the source coerces types which could incur loss of precision. The APIs also include a limited number of coercion macros, but these are different for the two API header sets, so it is simpler and easier to do explicit coercion when needed.

I recommend that:

  • Except for a couple of cases, the rest of our code uses various int types so we should stick to the integer forms except where the values are truly floating point.
  • We should use the above 5 integer forms and add explicit type case coercion to all uses where the return is not type int (e.g. size_t. We also coerce integer inputs where there is a potential loss of precision (e.g. a size_t input to lua_pushinteger().
  • We review all uses of double or float. IMO, we should stick to using lua_Number as the float type so that this facilitates @pjsg's optional 64 bit PR. A couple of modules (e.g. sjson and struct may need special handling) but I will leave this to Philip as their active maintainer.

I will post a fully code analysis next, but I have to be sociable to visitors for the next few hours.

TerryE avatar Aug 24 '20 14:08 TerryE

~~Having gone through the code, I don't want to break the legacy 5.1 integer mode builds and the issue we have for these is that lua_Number is type int and using lua_Number for a default float type would break these for 5.1 integer builds, so on reflection it is often just simpler leaving the internal float temporary variable as double for both builds. I have also added a lua_Float which is a double on 5.1 builds and a synonym for lua_Number on 5.3; when we just want FP, but only need 6 d.p. we can use lua_Float and this will also work fine on both Lua variants.~~

  • ~~ads1115, bme280, bme680, dht, si7021, struct all use double variables.~~
  • ~~I've changed ads1115, dth and si7021.~~
  • ~~bme280, bme680 do some weird stuff which I have left using double -- one for the maintainer.~~
  • I will leave struct to @pjsg

~~Other reviews changes.~~

  • ~~app/pm/pmSleep.c used lua_tonumber() etc. and I've tidied this coding up. @dnc40085 will need to check.~~

Slight change of subject

When I 've been going through this code, I find myself slipping into #1028, in that the way that we've coded up much of the parameter validation is just so sloppy; this can be done faster, generated less code, and also tighter. A typical coding pattern is something along the lines of

    lua_getfield(L, 1, "start_ch");
    if (!lua_isnil(L, -1)){
      if(lua_isnumber(L, -1)){
        start_ch = (uint8)luaL_checkinteger(L, -1);
        luaL_argcheck(L, (start_ch >= 1 && start_ch <= 14), 1, "start_ch: Range:1-14");
        cfg.schan = start_ch;
      }
      else
        luaL_argerror(L, 1, "start_ch: must be a number");
    }
    else
      cfg.schan = 1;

"Some parameter is an integer in the range x to y; either with no default or with a specific default" is a really common coding pattern in our modules, and this is coded "long hand" in about 4 different ways. So in this case omitting start_ch defaults to 1 but what happens with start_ch = 1.0 or start_ch = 2.1 even? In some coding cases the latter fails to an invalid parameter error); some cases you get a "number has no integer representation_" error; others 0; others the "start_ch: must be a number" error. It's basically a mess and I feel that we should have a consistent coding pattern that we should adopt. A common mistake is to do an API call such as luaL_checkinteger() which throws an error if the stack entry isn't an integer but then do extra validation where the error paths will never be taken. There are also nice macro extensions such as luaL_optinteger(L, ndx, default) which greatly simplify default handing.

#define luaL_opt(L,f,n,d)	(lua_isnoneornil(L,(n)) ? (d) : f(L,(n)))

Lua itself give some good templates, for example in ltablib.c for the argument to table.remove:

  lua_Integer pos = luaL_optinteger(L, 2, size);
  if (pos != size) 
    luaL_argcheck(L, 1 <= pos && pos <= size + 1, 1, "position out of bounds");

Our coded equivalents might be a dozen lines instead of 3 and not be as robust. Pull my hair out time. In my bones I would like to sort out this, but being realistic with every edit there is a small chance of making a breaking change, and so this sort of code change really needs testing. So on reflection and however tempting, this isn't something that we should realistically as part of this lua numbers review.

  • If the current code pattern works then leave any optimisations for now.
  • Focus on converting number -> integer calls is the existing call results in a potential loss of precision, and / or where a simple 1 for 1 switch number -> integer will address the issue.

Defer the API cleanup to a second pass, starting with node, file, net and wifi as separate issues / PRs , Perhaps we start with one of these and get a couple of the other committers (@nwf, @HHHartmann, ... or whoever wants to volunteer) to go through the changes in details so we can establish an agreed best-practice pattern, and then we can divide the rest out between those committers than would like to help execute the cleanup.

Thoughts? N + G + @jmattsson @pjsg @galjonsfigur ... ?

TerryE avatar Aug 25 '20 22:08 TerryE

Definitely defer this, I'd say. While I absolutely agree it's something that should get done, let's get all the big LVM stuff landed and settled. I'd like to get to the point where we can resume merging the 8266 and esp32 branches, but it'd be madness to have two major tickets running concurrently :D

jmattsson avatar Aug 26 '20 00:08 jmattsson

I agree with @jmattsson - It's true that many of C modules are interacting with Lua C API in sloppy way, but they are proven to work. For example, there were no big changes of Lua interface for ta least 2 years in wifi C module (aside from some code replacements with luaL_ and lua_ helper functions). Deferring this to be done after merging esp8266 and esp32 branches would give us another benefit of being able to cleanup C modules one by one, but for both variants, additionally verifying API compatibility between them.

galjonsfigur avatar Aug 26 '20 09:08 galjonsfigur

  • node.deepsleep() takes a sleep time in uSec. The SDK API calls take a uint64 argument. Reading the reference in the node documentation, this is around 225 mins which is longer than the 71 mins corresponding to MAX_UINT. In version 5.1 this just casts a double to a uint64 but for 5.3 I've used the following which is lossless up this 71 mins, but this being said the timer is accurate to far less than 6 d.p. so there is no practical loss of precision because of this float to 64-bit int conversion. I also converted the unsigned < 0 check which is pointless to a < 10 hrs which will at least pick up integer overflows.
#if LUA_VERSION_NUM == 501
    us = luaL_checknumber(L, 1);
#else /* 503 */
    us = lua_isinteger(L, 1) ? lua_tounsigned(L, 1) : (uint64) lua_tonumber(L, 1);
#endif
  • struct.c Format specifier d will only convert a float value on 5.3. getinteger() evaluates an int32 on the ESP but assigns this to a lua_Number for return. It would be better to stay as a 32-bit integer at least on 5.3. However I haven't updated this and have left this one to Philip.

See latest push to #3221.

TerryE avatar Aug 26 '20 22:08 TerryE

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Aug 28 '21 17:08 stale[bot]

Not safe to close this one.

TerryE avatar Sep 26 '21 19:09 TerryE