Reduce memory writes and short-lived allocations.
I noticed that hot parts of the program (allocation and evaluation) were doing more work than is strictly necessary, so I've removed some of it.
-
Removed the memset from tcl_append_string, since all but one byte of it is immediately overwritten by strncpy. Just zeroed the terminating byte instead.
-
Loosened the return type of tcl_string to char* (not const), since many uses of it mutate the returned buffer.
-
Replaced the initializing copy in tcl_append_string with memcpy, which is slightly faster than strncpy in cases like this where the length is known in advance. (This is about a 5% improvement in my tests.)
-
Altered tcl_append to not free the right-hand argument, after noticing that almost every use of it created an otherwise-unused copy.
-
In TCMD handling, rely on tcl_list_at to detect an empty command by returning NULL. This avoids a second tokenization pass over the string in tcl_list_length.
-
Avoid making a short-lived copy of the value in tcl_var, just take ownership of the pointer instead.
On a small collection of microbenchmarks, this improved performance by about 25-30% on my machine.
(Noticed your suggestion to run clang-format on PRs, updated the commit accordingly -- it's done something odd to the TCMD case block, but now it's consistent.)