nodemcu-firmware Rework of the lua source hierachy to support a unified apporach to ESP8266 and ESP32

Background

The Lua source hierarchy is currently based on the following fork tree:

Standard Lua 5.1.5 which is designed to compile on any POSIX toolchain and released in Feb 2012.
eLua is based on 5.1.5 but has a number of enhancements and changes to optimise it for embedded use, including use of newlib toolchain and a set of template hardware drivers. The main enhancements to the Lua core were in the inclusion of an Emergency Garbage Collector (EGC) implementation of lightweight C functions and ROM-based tables.
NodeMCU was again a fork of eLua by Zeroday, but because the non-OS SDK structure and execution models where quite different from the eLua assumptions, NodeMCU really only used the Lua core components. Also the SDK interface is quite different to newlib one, so this fork also introduced a lot of code changes (e.g. the string library calls were replaced by the equivalent SDK cstring ones.)

Issues

Since the initial 1.x versions of NodeMCU, we have subsequently addressed many of the source level conflicts between newlib and the SDK libraries which in turn mean that many of the original nodeMCU changes are no longer needed. Moreover
The ESP32 IDK is newlib-based and layered on RTOS and therefore is incompatible with these initial NodeMCU changes.
The subsequent Lua versions have incorporated many useful components, for example 5.2 incorporates the eLua ECG and lightweight C functions, and therefore provides most of what we use from eLua. The only additional bit that we use is ROTables and we are currently investigating reimplementing this because of the performance hit on the ESP flash architecture. 5.3 supports integers as a native type alongside floating point. The eLua project itself is now pretty moribund (only 11 new threads on the eLua DL during 2016).

Whilst we have arrived at our current NodeMCU Lua code base in a set of logical steps, the reality is that this could be viewed as a dead-end from a maintenance perspective. I believe that we should now take stock and decide together how we approach a scenario where we wish to maintain a common Lua code base for both NodeMCU ESP variants.

Do we believe that is is worth maintaining a common Lua code base for the two projects?
Do we go back towards the newlib eLua reverting no longer needed changes?
Do we move forward onto the 5.2 or 5.3 code?

More thought and discussion is needed, but my view is that this should be around options to maintain a common code base.

Dec 10 '16 18:12 TerryE

I really prefer to maintain as much common code as possible, because (I believe) sooner or later Espressif may release IDK for ESP8266-RTOS.
I vote up for update lua codebase to latest one (maybe 5.3), this have very good enchancements. Also I prefer to get it clean as possible, for ability to update it when new patches or versions released.

Dec 11 '16 10:12 djphoenix

Hummm, things get a little more complicated with 5.3.3. For a start it has proper 32-bit support as an option. Lua 5.1.5 uses a unified number type which is a 64 bit floating point and which can accurately represent integers up to 2^52, IIRC. With Lua 5.3.3 32-bit build this instead uses separate types for int32 and 32 bit FP. OK, we would lose the exact large integer support for integers > 2^31, but this a lot faster on the 8266, and a lot faster again on the ESP32 since this has H/W support for 32-bit FP.

Dec 12 '16 22:12 TerryE

That does sound tempting considering the hardware we're running on.

Dec 13 '16 00:12 jmattsson

The only issue is that there seem to be bleats about increased memory usage in 5.3. Still researching :(

PS. I can't see any obvious changes other than strings are now only interned if 40 bytes or shorter. In fact there are now two string types: short strings which are unique and long strings which aren't. I guess this is to avoid the cost of hashing long strings, since it is assumed that these are rarely duplicated. But note that they are still copied by reference and hence can be reused, it's just that long strings aren't put into the string hash table, so

local s = '012345678901234567890123456789'
local a = s .. s
local b = b
local c = s .. s

will now create two identical 60 character strings '012...789', one used by a and b, and a second by c.

The reasons for this are pragmatic. The hash function is O(N) and the chances of two long string being accidentally identical are small so there are strong arguments for only interning strings under a length threshold.
And a LRU cache has been added for accessing C strings to avoid hashing short Cstrings in the case of a cache hit, but a bit bizarre implementation.

Dec 13 '16 01:12 TerryE

Having done a bit of the work, and though about this some more overnight, I feel that the next step is for me to add a rotables mod into the 5.3.3 build, so that it can be built both as a POSIX standalone (under x86) and under the xtensa toolchain with a tweaked node library so that it can also run as a firmware image on the ESP8266 (or ESP32) for evaluation and to inform discussion as to whether we should proceed with a proper integration into an SDK 2.x build. I'll set up a clean repository so that others can track this.

Dec 13 '16 10:12 TerryE

A quick update.

I am getting to grips with Lua version 5.3.3 and the differences from 5.1.5. Perhaps the main one is that a lot of optimisations and enhancements to support embedded implementations are now in the core -- such as the EGC. The one big omission is no support for rotables. What I have done so far is to add support for RO TString constants, with some extra TS variants of get and set methods so that you can use them in the C API. So now LUA_TS(__index) in the code is a TString * to the extern in ROM which is the TString for "__index". The ROstrt is in the searchlist for creating new strings so references to __index in any compiled Lua code will pick up the same TString instead of creating a new one in RAM. Hence string comparisons use a pointer comparison, occasionally dropping down to a memcmp (since now the length of both strings are known) rather than strcmp(), so less entries in the RAM G[L]->strt and in principle a lot less unaligned ROM access exceptions.

The ROstrt is generated by a preprocess of the code which detects these LUA_TS() usages and generates a c header file and an assember file which contains the static initialisers. It will take me another couple of days to get this working fully on the x86 platform. At this stage, I'll push a copy to my github repo for anyone interested to review.

Dec 17 '16 11:12 TerryE

Another quick update. I've got Lua 5.3 working with ROstrt initialised in the .text segment with the 303 common TStrings used in the compiler and core libraries, with the extra API methods to use TString parameters directly when appropriate. Need to add the rest of the ROtable patch next and extend the test suite. Next build will be on a 32-bit Ubuntu VM with 32 ints and floats. Upwards and onwards.

Dec 27 '16 01:12 TerryE

I've done a rough cut of my working paper documenting progress in this gist: LROR (Lua Read-Only Resources) in Lua 5.3. That's on top of fitting my new kitchen and becoming a grandfather (though the labour on this one was down to my daughter!) Any general comments then feed them back here :smile:

I am currently hammering the table implementation on an x86 build.

Jan 06 '17 01:01 TerryE

Two comments (other than "good job!") so far:

Pretty please with sugar on the top, don't #include C_HEADER_STDIO in the final stuff; Let's do the cleanup on the ESP8266 side rather than pollute the POSIX and ESP32 sides. It's largely the custom-modded eLua stuff which needed the c_stdio.h stuff in the first place, and if we're replacing it all, we really shouldn't need it.
Is there a version of LROR_WORD(x) than can handle strings with non-C-identifier symbols in them? LROR_WORD(x and y, with some z!) clearly wouldn't give a valid extern declaration. Or are LROR strings limited in this regard?

Jan 06 '17 01:01 jmattsson

@jmattsson, picking up these:

I went through some of this with the luac.cross port. The issue that we have with GCC is that the search list for system includes is configured in the compiler build, and it got very convolved. The xtensa toolchain is based on newlib libraries, which are different to the standard Linux libraries, and the non-OS SDK ones are different again. I agree that we might be able to back-port most of these SDK changes into ESP8622 toolchain newlib libraries, but there will still be some incompatibilities between this and the Linux set. At the end of the day, IMO it is more important that we have a single code-base for all three targets (ESP8266, ESP32 and Linux) and this preprocessor define trick works fine. This comes into the category of polish rather than show-stopper.
Yup LROR_STRING(does_not_compute, "Does not compute"). The same string can can be declared twice with different symbolic names and this will be handled, but the same symbolic name can't be used with two strings.

Jan 06 '17 14:01 TerryE

It shouldn't be too bad. As the first stage, simply renaming our c_whatever.h files to whatever.h and using -nostdinc will preference our "standard C" headers. Adding a -isystem "$(xtensa-lx106-elf-gcc -print-sysroot)/usr/include" will get the compiler to find the HAL etc (and any non-overridden headers). And yes, while in some ways it's polish, it's also about reducing the number and extent of Lua source files that have to be modified. Feel free to ping me when you're at a stage you're wanting to start integrating and I'll try to get some time to lend a hand (which I probably should for the ESP32 side anyway).
Nice!

Jan 07 '17 00:01 jmattsson

@jmattsson Johny, I am a little less concerned about the issue of minimising the changes. First step is t evaluate viability and benefits. And if we have a go then we review some of the implementation strategy. I did this with the LCD patch and did a master diff before the PR. I went through this diff and in a few areas reworked the modification to minimise the code change.

Once I have a clean version which passes my extended test suite, then I think that we're at the point where I can push the branch for review and evaluation.

Jan 07 '17 01:01 TerryE

Oh yes, you might've noticed I said "in the final stuff" in regard to point 1.

It's all sounding very promising so far though. This could be as big a RAM improvement as when we originally managed to get the C-string constants moved into flash. No pressure... ;)

Jan 07 '17 01:01 jmattsson

Just another quick update. I am still plugging away at this making steady progress. It's just that I am time-slicing it with little jobs like fitting the new kitchen in my new house and doing the plumbing. There are all sorts of subtleties involved here, so I am not rushing and doing lots of brood time so the changes don't compromise some of the nice features of 5.3 and also don't cause unacceptable performance hits.

A little example is that one of the things that you want to do for example it to set the metatable for a ROtable using the normal API. You do this if you want to extend string or table, for example. But the metatable field in standard Lua is in a field in the Table structure. You can't do this in the case of ROtables since you need this one field to be RW, so you need to store it the registry. However, you don't want to do this in general since the ability to override almost every data type in 5.3 means that this is checked a lot at low level so for RWtables, you need to keep the field in the Table record for RW variants. Hey-Ho.

Anyway, I've updated the White Paper in my Gist with current progress. As soon as I've got a standalone working compliant to the paper, I'll upload it to my repo.

Feb 07 '17 09:02 TerryE

@flip111 I am multiplexing a lot of issues at the moment. A straight port of Lua 5.3 would be reasonably easier but would not perform as well, nor ave anywhere as near as small a RAM footprint as our currently esp8266-optimised 5.1 version. So I have to merge in these optimisation features and add a few more because the extra Lua 5.3 feature set comes with an extra RAM footprint. So slow progress.

Read the working paper linked above :)

Mar 26 '17 20:03 TerryE

Very interesting how this is evolving, good work @TerryE

Mar 26 '17 20:03 flip111

@jmattsson @pjsg, I am loosing the plot!! I've discussed backporting these enhancements into the current NodeMCU lua 5.1.5 implementation but I can't for the life of me remember where. Anyway, I've done the first cut to compile clean and am in the middle of testing. But on-chip testing is a bitch. Have either of you got a good primer on using the remote gdb stub. Do you use it?

Anyway, I've found the easiest way is to work on luac.cross which I can debug using regular gdb on the laptop. This has been a very useful exercise. If one of you guys can point me to the correct issue / PR then I'll give a fuller update there.

Apr 26 '17 00:04 TerryE

The gdbstub is definitely not perfect, but there is some documentation in the gdbstub module doc. It is really good for catching crashes and poking around..... If you

#define GDBSTUB_BREAK_ON_INIT 1

then it will enter gdb fairly early on (as it is initializing the lua modules). This may be too late, in which case you probably want to move the gdbstub_init() call to somewhere rather earlier in th eboot process.

Also apologies for not being engaged for the last month or so, I've been rather under the weather...

Apr 26 '17 00:04 pjsg

@TerryE Are you referring to https://gist.github.com/TerryE/8afa5022042291b8add1ff3886f6c014 that you linked above?

Apr 26 '17 03:04 jmattsson

I've used my luac.cross technique to build a stripped done NodeMCU lua core VM running on the host so I can debug the core code on my laptop. Bizarre but a lot quicker and easier than using gdbstub. Onwards!!

Apr 28 '17 01:04 TerryE

I am pretty close to having an evaluation version of the NodeMCU Lua 5.1.5 evaluation version ready. As I eluded to earlier this is based on a backport of the RO TString technology that I developed for my Lua 5.3 port, but leaving in the current ROTable implementation rather than trying to implement true RO Lua Table types.

This has been a very useful exercise because it has allowed me to appreciate some issues in concrete form.

Having a common code base

We should have a single Lua core code base that will compile to the host platform (gcc compatible *nix), esp8266 and esp32.

As I mentioned above I already have a version of the NodeMCU lua interpreter running on my dev VM on my laptop. OK, the libraries are different (e.g. io and os work, but not node or file) but we can converge on this later. Having a host-runnable means that full gdb feature are available for testing as well as the Lua test suite, etc. And minimising code variants means that we can have increased confidence that bugs that have been exercised in host testing are also removed from the esp8266 and esp32 variants.

Just to be clear this host version has ROTables, and RO TStrings; it's just that they lie between _etextand _edata rather than in some defined Flash memory address space. Code compiled with this host luac will run on the esp chips as well as code dumped from this host lua*.

What I find really irritating about the NodeMCU code is that it is full of crap, for example chunks of source #if 0 commented out and conditional on defines that we no longer support (for example NodeMCU must be built with full ROtable support). It's probably not worth clearing this out, though to be honest when I am debugging the core code and find this crap is confusing things then I have started to remove it.

However what is very clear to me is that when we come to the Lua 5.3 port we should have a very clear set of objective which then give rise to a set of change templates or patterns, but that we should only make changes above and beyond this that are clearly documented.

Getting the peephole optimisation right

There are some aspects of our Lua implementation which cause a noticeable performance hit for no good reason and I really think that we should address these. In particular, access ROM based readonly tables and elements currently is a lot slower than RAM based ones.

Interim 5.1 update or straight to 5.3?

To be honest this 5.1 version was a pretty important step for me. It will run faster than the current version of the core code; I could extend it so that the ESP32 version could just use this rather than its own variant. It is certainly worth considering a proper NodeMCU branch for its evaluation, but is it worth promoting to dev (at some point after enough testing) or should we just plan to go direct to 5.3?

The points about 5.3 from a developer PoV is that it is not 100% code-compatible with Lua 5.1. It supports separate float and int number types and on the ESP chips, it makes sense to make these both 32 bit, especially on the ESP 32 where these are both supported in H/W . But this will lose the ability to do 52-bit integer arithmetic, so we might need to add a in64 library.

Apr 29 '17 10:04 TerryE

Better proof first that ESP8266+ESP32 can be combined into 1 firmware before adding lua 5.3 support

Apr 29 '17 13:04 flip111

@flip111, you misread this goal. We will never have one firmware because the two builds target different generation processors and have different underlying platform APIs. The goal here is to have a single Lua core source base and at the moment, the two branches are different. If I can get the same Lua source running on the esp8266 core and on a 64bit Intel core then I can certainly get the same source base running on esp32.

Apr 29 '17 13:04 TerryE

Seems you are confident that the 5.1 update will work, maybe go for 5.3 then.

Apr 29 '17 13:04 flip111

Yup, but the Lua 5.1 has advantages performance and inter-platform, and is fully source code compatible, so as long as it works, then we could promote it to dev. The move to 5.3 really merits a wider discussion and buy--in from the committers.

Apr 29 '17 14:04 TerryE

I'm on the fence regarding 5.1-upgrade vs straight-to-5.3. On the one hand I don't like the extra work required to get the interrim 5.1 in, but on the other hand it would likely be able to go in sooner, which would benefit everyone. As long as there's a reasoned argument, I could support either approach. I would very much like to see the Lua core being cleaned up though - as you've noticed, we've got a metric truck ton of cruft in there. Hopping straight to a 5.3 might be the least painful option.

Regarding integer types, we could perhaps do a userdata int64 with __add etc meta-methods? Would it be possible to auto-promote to such a userdata type when needed? Checking add & mul on the int type should allow for catching overflows and trigger promotion, but I don't know how feasible it would be to augment the 5.3 codebase.

Apr 30 '17 01:04 jmattsson

I am still testing the 5.1 backport on a dev build to put it through the Lua test suite clean. I've also done a full code review using meld against a vanilla Lua 5.1 and have made the following observations which swing my view of the one step vs two step path (straight to 5.3 vs 5.1 upgrade then 5.3).

The eLua variant add support for the EGC, lightweight C functions and rotables as well as other cross support options such endian and packed vs unpacked support.
IMO, the rotables implementation is a bit of a hack as it is very run-time inefficient (lots of linear strcmp scans of flash), and all rotables args are void * types so there is no compile-time validation of a lot of stuff that should be checked at compile-time, which makes testing changes fraught. Also the way of implementing lightweight functions doesn't follow standard Lua coding patterns.
NodeMCU uses the eLua Lua core variant for the EGC, lightweight C functions and rotables functionality, but the xtensa architecture doesn't need the endian and packed stuff. (It also discards all of the eLua platform and driver implementations.)
Standard Lua 5.3 has added EGC and lightweight function support as well as other changes to support embedded processors not included in eLua (separate integer and float sub-types and the ability to force the number size to 32-bits, for example). However, this is a re-implementation using standard coding patterns.
One of the architectural changes that 5.3 implements to do this is to split the ttt type field in Lua Values and Garbage Collectables, into a 4-bit bit field and 4-bit subtype. The following types are sub-typed:
- Numbers have separate integer and float subtypes
- Strings have separate interned and non-interned subtypes. Strings of 40 bytes or less are interned. Long ones aren't. This is for performance: hashing is a function of length and in the majority of implementations (especially embedded) the chances of a duplicate instances of a long string is extremely small.
- Functions have separate full and lightweight subtypes.
Lua 5.3 also adds a new separate GC category "don't collect" which I will use to tag all RO resources.
In eLua rotables are a separate type that is handles completely differently to normal tables, and hence this implementation leaks out into a lot of code changes in ltable.c, lvm.c, etc.
My proposed RO table implementation makes Tables a fourth sub-type type by splitting tables into two separate RW and RO table subtypes, and the difference is only in the low level access routines, and is so is well encapsulated. The RO table implementation internally is still a vector list as with the current eLua rotables implementation, but access is through a look-aside cache so 95% of the RO table access are direct.

So my conclusion is that the changes to get from standard Lua 5.3 to a NodeMCU 5.3 are lot less than cleaning up my current 5.1 WIP version. And in my experience the smaller the change, the less bugs get introduced so the more stable the implementation. So I now propose to release a 5.3 evaluation version next.

Even so, doing the 5.1 exercise has been valuable in that it has forced me to think through the API issues in minimising the impact on the rest of the NodeMCU ecosystem, and has also cleaned up some aspects of my implementation.

May 09 '17 11:05 TerryE

:+1: Nice summary, thanks!

May 09 '17 11:05 jmattsson

I think that makes sense -- it is nearly always the right decision to move forward when making large changes....

May 09 '17 12:05 pjsg

Getting your head around all of this isn't easy. I kept coming across fine details of the implementation and wondered why on earth is was coded that way. So I decided to go back to basics and did a

/work/esp8266/lua-5.1.5/src$ for f in *.[hc]; do test -f /tmp/n/app/lua/$f && meld {.,/tmp/n/app/lua}/$f; done

(/tmp/n is just a symlink onto my dev VMs nodemcu directory) when I realised that Bogadan's coding style and approach for his eLua changes is not very aligned to Roberto's approach to the core Lua code.

In fact 5.3 includes a cherry-pick of the best bits of eLua but in Lua style, and this really makes eLua obsolescete, IMO, given that we don't use any of the other eLua ecosystem. I see no advantages of attempting to retain any this extra eLua stuff for the 5.3 version.

Throw in some of original hacks that we made in getting our early versions up and we've currently got a bit of a mess. So what I want to do is to swap out the Lua core entirely, but adopting the pragmatic approach of keeping the NodeMCU module layering and things that work well, such as our (or Johny's to give credit where it is due) linker magic the same API or with the absolute minimum changes.

I want to keep the triple platform support : dev host, ESP8266 and ESP32, but the dev host is in a limited version: I am currently not supporting a standard Lua RTS, but instead the Lua complier luac has new -X option which allows you to not only to compile but also execute a script (on the host), but this can only use the modules in the core lua and lua/lua_cross directories. This still allows the limited use of NodeMCU Lua on the dev host for bootstrapping and gdb testing, which is all I need. (We can always revisit this later if there is a demand.)

I getting rid of all of the c_str... in the code. Unfortunately we still need to compile against some of the clib c_*.h includes on the ESP8266, so I do need a minimal abstraction layer here to avoid compile-time errors.

This is all slow going since I have to re-engineer all of the Lua internals from code inspection and its implementation is very dense so this can be hard going. Still, pondering this exercises the mind when I am rebating 72 hinge sides for my 12 doors.

//Terry

May 09 '17 12:05 TerryE

nodemcu-firmware nodemcu-firmware copied to clipboard

Rework of the lua source hierachy to support a unified apporach to ESP8266 and ESP32

Background

Issues

Having a common code base

Getting the peephole optimisation right

Interim 5.1 update or straight to 5.3?

nodemcu-firmware
nodemcu-firmware copied to clipboard