umka-lang
umka-lang copied to clipboard
The performance is not particularly great compared to Lua
Take something like this
fn main() {
x := 0.0
for i := 0.0; i <= 10000000.0; i += 1 {
x += i
x /= 2.0
}
printf("%f\n", x);
}
versus this
x = 0
for i=0,10000000 do
x = x + i
x = x / 2
end
print(x)
First is umka, second is lua. On my computer at least, umka is about 3x slower. Which is a shame because it has potential to be much faster than lua. I usually don't pay attention to the speed of scripting languages, since it's almost always not the issue, however since I'm writing almost the entire game in umka, the speed matters, and it shows: Most of the "self" execution time is spent in vmRun. The above isn't a perfect benchmark, so here's few more (these benchmarks mostly tackle umka's "raw" speed, e.g. crunching speed):
fn fib(x: int): int {
if x < 2 {
return x
}
return fib(x-1) + fib(x-2)
}
fn main() {
printf("%d", fib(40))
}
20 seconds vs
x = 0
for i=0,10000000 do
x = x + i
x = x / 2
end
print(x)
10 seconds (lua about 2x faster)
type (
StringMod = interface {
f(s: str): str
}
TypeA = struct {}
TypeB = struct {}
)
fn (t: ^TypeA) f(s: str): str {
return s + "a"
}
fn (t: ^TypeB) f(s: str): str {
return "b"
}
fn apply(iface: StringMod, n: int): str {
s := ""
for i := 0; i <= n; i++ {
s = iface.f(s)
}
return s
}
fn main() {
a := TypeA{}
b := TypeB{}
apply(a, 100000)
apply(b, 100000)
}
(interface version) 600ms
fn apply(f: fn (s: str): str, n: int): str {
s := ""
for i := 0; i <= n; i++ {
s = f(s)
}
return s
}
fn main() {
apply(fn (s: str): str { return s + "a"; }, 100000)
apply(fn (s: str): str { return "b" }, 100000)
}
(function pointer version) 900ms
function apply(interface, n)
s = ''
for i=0,n do
s = interface.f(s)
end
return s
end
a = { f = function(s) return s .. 'a' end }
b = { f = function(s) return 'b' end }
apply(a, 100000)
apply(b, 100000)
("interface" version) 400ms.
function apply(f, n)
s = ''
for i=0,n do
s = f(s)
end
return s
end
apply(function(s) return s .. 'a' end, 100000)
apply(function(s) return 'b' end, 100000)
(function pointer version) 566ms
There's more I can show. But I just think there's room for improvement when it comes to performance. I've tried to re-implement umka with focus of VM performance, but the project is nowhere near as complete as umka. While it's faster than Lua in general, I don't think it's fair to compare it with umka yet. (https://github.com/skejeton/elka), It's pretty much abandoned//
Either way, good luck. And if something, just ask me. I'll be glad to respond and/or help.
Originally posted by @skejeton in https://github.com/marekmaskarinec/tophat/issues/30#issuecomment-1158684630
Related issue: #140
@skejeton Your Lua example:
x = 0
for i=0,10000000 do
x = x + i
x = x / 2
end
print(x)
can be even 10x faster than Umka if you declare x as local x = 0, so that it can be treated as a conventional variable rather than a table item stored in an upvalue.
Sad for Umka.
Update W.R.T. performance.
Here I'm using Lua 5.4 downloaded from official website vs Umka from the current main branch compiled with build_windows_msvc.bat on VS 2022.
Please note that the performance differences may be a bit weird because last time when I ran it, I ran it on a Linux machine, not Windows. Nonetheless:
Tests
fn main() {
x := 0.0
for i := 0.0; i <= 10000000.0; i += 1 {
x += i
x /= 2.0
}
printf("%f\n", x);
}
= 1.121s
x = 0
for i=0,10000000 do
x = x + i
x = x / 2
end
print(x)
= 0.423s
fn fib(x: int): int {
if x < 2 {
return x
}
return fib(x-1) + fib(x-2)
}
fn main() {
printf("%d\n", fib(40))
}
= 21.338s
function fib(x)
if x < 2 then
return x
else
return fib(x-1) + fib(x-2)
end
end
print(fib(40))
= 11.116s
type (
StringMod = interface {
f(s: str): str
}
TypeA = struct {}
TypeB = struct {}
)
fn (t: ^TypeA) f(s: str): str {
return s + "a"
}
fn (t: ^TypeB) f(s: str): str {
return "b"
}
fn apply(iface: StringMod, n: int): str {
s := ""
for i := 0; i <= n; i++ {
s = iface.f(s)
}
return s
}
fn main() {
a := TypeA{}
b := TypeB{}
apply(a, 100000)
apply(b, 100000)
}
= 2.611s
function apply(interface, n)
s = ''
for i=0,n do
s = interface.f(s)
end
return s
end
a = { f = function(s) return s .. 'a' end }
b = { f = function(s) return 'b' end }
apply(a, 100000)
apply(b, 100000)
= 0.487s
fn apply(f: fn (s: str): str, n: int): str {
s := ""
for i := 0; i <= n; i++ {
s = f(s)
}
return s
}
fn main() {
apply(fn (s: str): str { return s + "a"; }, 100000)
apply(fn (s: str): str { return "b" }, 100000)
}
= 2.532s
function apply(f, n)
s = ''
for i=0,n do
s = f(s)
end
return s
end
apply(function(s) return s .. 'a' end, 100000)
apply(function(s) return 'b' end, 100000)
= 0.462s
Remarks
For some reason interfaces function significantly slowly, than before. I'm not sure whether it's because of an OS change or Umka change, or some other factor. Best way to verify is to test on Jun 17 version.
For this script
fn main() {
var x: real = 0.0;
var y: real = 0.0;
for y <= 100000000.0 {
x += y;
x /= 2.0;
y += 1.0;
}
printf("%f\n", x);
}
umka -asm -check script.um generates the following instructions for the while loop
000000012 4 PUSH_LOCAL real -24
000000013 4 PUSH real 1e+08
000000014 4 BINARY <= real 8
000000015 4 GOTO_IF 18
000000016 8 GOTO 37
000000017 4 GOTO 12
000000018 5 PUSH_LOCAL_PTR -16
000000019 5 DUP
000000020 5 DEREF real
000000021 5 PUSH_LOCAL real -24
000000022 5 BINARY + real 8
000000023 5 ASSIGN real 8
000000024 6 PUSH_LOCAL_PTR -16
000000025 6 DUP
000000026 6 DEREF real
000000027 6 PUSH real 2
000000028 6 BINARY / real 8
000000029 6 ASSIGN real 8
000000030 7 PUSH_LOCAL_PTR -24
000000031 7 DUP
000000032 7 DEREF real
000000033 7 PUSH real 1
000000034 7 BINARY + real 8
000000035 7 ASSIGN real 8
000000036 8 GOTO 17
The body of the while loop has 25 instructions.
For this script
function main()
local x = 0.0
local y = 0.0
while y <= 100000000.0 do
x = x + y
x = x / 2.0
y = y + 1.0
end
print(x)
end
main()
luac -p -l script.lua > script.lua.txt generates the following instructions for the while loop
3 [4] LOADK 2 0 ; 100000000.0
4 [4] LE 1 2 0
5 [4] JMP 7 ; to 13
6 [5] ADD 0 0 1
7 [5] MMBIN 0 1 6 ; __add
8 [6] DIVK 0 0 1 ; 2.0
9 [6] MMBINK 0 1 11 0 ; __div 2.0
10 [7] ADDK 1 1 2 ; 1.0
11 [7] MMBINK 1 2 6 0 ; __add 1.0
12 [7] JMP -10 ; to 3
The body of the while loop has 10 instructions.
In fact the JMP instruction that exits the while loop is only executed on the last iteration (snippet from lvm.c):
case OP_LT: case OP_LE:
case OP_LTI: case OP_LEI:
case OP_GTI: case OP_GEI:
case OP_EQ: { /* note that 'OP_EQI'/'OP_EQK' cannot yield */
int res = !l_isfalse(s2v(L->top - 1));
L->top--;
#if defined(LUA_COMPAT_LT_LE)
if (ci->callstatus & CIST_LEQ) { /* "<=" using "<" instead? */
ci->callstatus ^= CIST_LEQ; /* clear mark */
res = !res; /* negate result */
}
#endif
lua_assert(GET_OPCODE(*ci->u.l.savedpc) == OP_JMP);
if (res != GETARG_k(inst)) /* condition failed? */
ci->u.l.savedpc++; /* skip jump instruction */ <=====
break;
}
Even if Umka could execute it's instructions way faster than Lua, that wouldn't help much because it still has to execute ~3x more instructions.
I think this is a problem in all implementations that execute bytecode (Umka, Puc/Lua, CPython, Perl, Quickjs, etc). One simply can not afford to write while loops in them, but instead must rely on built-in functions to the the heavy lifting/looping.
Perhaps we could benefit from adding more complex instructions? For example a variant of binary, which works on the top two elements of the stack. Would that be worth it?
@luauser32167 You're right, the Umka virtual machine is not optimized enough. Please notice, however, that a one-to-one comparison between Umka and Lua 5 bytecode is hardly possible, since Umka is stack-based and Lua 5 is register-based.
@marekmaskarinec I sometimes think of having a single complex instruction for a for loop header in its most widely used form: for i := 0; i < expr; i++. AFAIK, even Lua 4 featured it.