Take something like this

fn main() {
    x := 0.0
    for i := 0.0; i <= 10000000.0; i += 1 {
        x += i
        x /= 2.0
    }
    printf("%f\n", x);
}

versus this

x = 0
for i=0,10000000 do
    x = x + i
    x = x / 2
end

print(x)

First is umka, second is lua. On my computer at least, umka is about 3x slower. Which is a shame because it has potential to be much faster than lua. I usually don't pay attention to the speed of scripting languages, since it's almost always not the issue, however since I'm writing almost the entire game in umka, the speed matters, and it shows: Most of the "self" execution time is spent in vmRun. The above isn't a perfect benchmark, so here's few more (these benchmarks mostly tackle umka's "raw" speed, e.g. crunching speed):

fn fib(x: int): int {
    if x < 2 {
        return x
    }
    return fib(x-1) + fib(x-2)
}

fn main() {
    printf("%d", fib(40))
}

20 seconds vs

x = 0
for i=0,10000000 do
    x = x + i
    x = x / 2
end

print(x)

10 seconds (lua about 2x faster)

type (
    StringMod = interface {
        f(s: str): str
    }

    TypeA = struct {}
    TypeB = struct {}
)

fn (t: ^TypeA) f(s: str): str {
    return s + "a"
}

fn (t: ^TypeB) f(s: str): str {
    return "b"
}

fn apply(iface: StringMod, n: int): str {
    s := ""
    for i := 0; i <= n; i++ {
        s = iface.f(s)
    }
    return s
}

fn main() {
    a := TypeA{}
    b := TypeB{}
    apply(a, 100000)
    apply(b, 100000)
}

(interface version) 600ms

fn apply(f: fn (s: str): str, n: int): str {
    s := ""
    for i := 0; i <= n; i++ {
        s = f(s)
    }
    return s
}

fn main() {
    apply(fn (s: str): str { return s + "a"; }, 100000)
    apply(fn (s: str): str { return "b" }, 100000)
}

(function pointer version) 900ms

function apply(interface, n)
    s = ''
    for i=0,n do 
        s = interface.f(s)
    end
    return s
end

a = { f = function(s) return s .. 'a' end }
b = { f = function(s) return 'b' end }

apply(a, 100000)
apply(b, 100000)

("interface" version) 400ms.

function apply(f, n)
    s = ''
    for i=0,n do 
        s = f(s)
    end
    return s
end

apply(function(s) return s .. 'a' end, 100000)
apply(function(s) return 'b' end, 100000)

(function pointer version) 566ms

There's more I can show. But I just think there's room for improvement when it comes to performance. I've tried to re-implement umka with focus of VM performance, but the project is nowhere near as complete as umka. While it's faster than Lua in general, I don't think it's fair to compare it with umka yet. (https://github.com/skejeton/elka), It's pretty much abandoned//

Either way, good luck. And if something, just ask me. I'll be glad to respond and/or help.

Originally posted by @skejeton in https://github.com/marekmaskarinec/tophat/issues/30#issuecomment-1158684630

Jun 17 '22 10:06 vtereshkov

Related issue: #140

Jun 17 '22 10:06 vtereshkov

@skejeton Your Lua example:

x = 0
for i=0,10000000 do
    x = x + i
    x = x / 2
end

print(x)

can be even 10x faster than Umka if you declare x as local x = 0, so that it can be treated as a conventional variable rather than a table item stored in an upvalue.

Sad for Umka.

Jun 17 '23 20:06 vtereshkov

Update W.R.T. performance.

Here I'm using Lua 5.4 downloaded from official website vs Umka from the current main branch compiled with build_windows_msvc.bat on VS 2022.

Please note that the performance differences may be a bit weird because last time when I ran it, I ran it on a Linux machine, not Windows. Nonetheless:

Tests

fn main() {
    x := 0.0
    for i := 0.0; i <= 10000000.0; i += 1 {
        x += i
        x /= 2.0
    }
    printf("%f\n", x);
}

= 1.121s

x = 0
for i=0,10000000 do
    x = x + i
    x = x / 2
end

print(x)

= 0.423s

fn fib(x: int): int {
    if x < 2 {
        return x
    }
    return fib(x-1) + fib(x-2)
}

fn main() {
    printf("%d\n", fib(40))
}

= 21.338s

function fib(x)
    if x < 2 then 
        return x
    else
        return fib(x-1) + fib(x-2)
    end
end

print(fib(40))

= 11.116s

type (
    StringMod = interface {
        f(s: str): str
    }

    TypeA = struct {}
    TypeB = struct {}
)

fn (t: ^TypeA) f(s: str): str {
    return s + "a"
}

fn (t: ^TypeB) f(s: str): str {
    return "b"
}

fn apply(iface: StringMod, n: int): str {
    s := ""
    for i := 0; i <= n; i++ {
        s = iface.f(s)
    }
    return s
}

fn main() {
    a := TypeA{}
    b := TypeB{}
    apply(a, 100000)
    apply(b, 100000)
}

= 2.611s

function apply(interface, n)
    s = ''
    for i=0,n do 
        s = interface.f(s)
    end
    return s
end

a = { f = function(s) return s .. 'a' end }
b = { f = function(s) return 'b' end }

apply(a, 100000)
apply(b, 100000)

= 0.487s

fn apply(f: fn (s: str): str, n: int): str {
    s := ""
    for i := 0; i <= n; i++ {
        s = f(s)
    }
    return s
}

fn main() {
    apply(fn (s: str): str { return s + "a"; }, 100000)
    apply(fn (s: str): str { return "b" }, 100000)
}

= 2.532s

function apply(f, n)
    s = ''
    for i=0,n do 
        s = f(s)
    end
    return s
end

apply(function(s) return s .. 'a' end, 100000)
apply(function(s) return 'b' end, 100000)

= 0.462s

Remarks

For some reason interfaces function significantly slowly, than before. I'm not sure whether it's because of an OS change or Umka change, or some other factor. Best way to verify is to test on Jun 17 version.

Jun 22 '23 13:06 ske2004

For this script

fn main() {
  var x: real = 0.0;
  var y: real = 0.0;
  for y <= 100000000.0 {
      x += y;
      x /= 2.0;
      y += 1.0;
  }
  printf("%f\n", x);
}

umka -asm -check script.um generates the following instructions for the while loop

000000012      4                   PUSH_LOCAL real -24
000000013      4                         PUSH real 1e+08
000000014      4                       BINARY <= real 8
000000015      4                      GOTO_IF 18

000000016      8                         GOTO 37

000000017      4                         GOTO 12

000000018      5               PUSH_LOCAL_PTR -16
000000019      5                          DUP
000000020      5                        DEREF real
000000021      5                   PUSH_LOCAL real -24
000000022      5                       BINARY + real 8
000000023      5                       ASSIGN real 8
000000024      6               PUSH_LOCAL_PTR -16
000000025      6                          DUP
000000026      6                        DEREF real
000000027      6                         PUSH real 2
000000028      6                       BINARY / real 8
000000029      6                       ASSIGN real 8
000000030      7               PUSH_LOCAL_PTR -24
000000031      7                          DUP
000000032      7                        DEREF real
000000033      7                         PUSH real 1
000000034      7                       BINARY + real 8
000000035      7                       ASSIGN real 8
000000036      8                         GOTO 17

The body of the while loop has 25 instructions.

For this script

function main()
  local x = 0.0
  local y = 0.0
  while y <= 100000000.0 do
      x = x + y
      x = x / 2.0
      y = y + 1.0
  end
  print(x)
end
main()

luac -p -l script.lua > script.lua.txt generates the following instructions for the while loop

  3 [4] LOADK     2 0 ; 100000000.0
  4 [4] LE        1 2 0
  5 [4] JMP       7 ; to 13
  6 [5] ADD       0 0 1
  7 [5] MMBIN     0 1 6 ; __add
  8 [6] DIVK      0 0 1 ; 2.0
  9 [6] MMBINK    0 1 11 0  ; __div 2.0
  10  [7] ADDK      1 1 2 ; 1.0
  11  [7] MMBINK    1 2 6 0 ; __add 1.0
  12  [7] JMP       -10 ; to 3

The body of the while loop has 10 instructions. In fact the JMP instruction that exits the while loop is only executed on the last iteration (snippet from lvm.c):

    case OP_LT: case OP_LE:
    case OP_LTI: case OP_LEI:
    case OP_GTI: case OP_GEI:
    case OP_EQ: {  /* note that 'OP_EQI'/'OP_EQK' cannot yield */
      int res = !l_isfalse(s2v(L->top - 1));
      L->top--;
#if defined(LUA_COMPAT_LT_LE)
      if (ci->callstatus & CIST_LEQ) {  /* "<=" using "<" instead? */
        ci->callstatus ^= CIST_LEQ;  /* clear mark */
        res = !res;  /* negate result */
      }
#endif
      lua_assert(GET_OPCODE(*ci->u.l.savedpc) == OP_JMP);
      if (res != GETARG_k(inst))  /* condition failed? */
        ci->u.l.savedpc++;  /* skip jump instruction */   <=====
      break;
    }

Even if Umka could execute it's instructions way faster than Lua, that wouldn't help much because it still has to execute ~3x more instructions.

I think this is a problem in all implementations that execute bytecode (Umka, Puc/Lua, CPython, Perl, Quickjs, etc). One simply can not afford to write while loops in them, but instead must rely on built-in functions to the the heavy lifting/looping.

Jan 31 '24 15:01 luauser32167

Perhaps we could benefit from adding more complex instructions? For example a variant of binary, which works on the top two elements of the stack. Would that be worth it?

Jan 31 '24 15:01 marekmaskarinec

@luauser32167 You're right, the Umka virtual machine is not optimized enough. Please notice, however, that a one-to-one comparison between Umka and Lua 5 bytecode is hardly possible, since Umka is stack-based and Lua 5 is register-based.

@marekmaskarinec I sometimes think of having a single complex instruction for a for loop header in its most widely used form: for i := 0; i < expr; i++. AFAIK, even Lua 4 featured it.

Jan 31 '24 17:01 vtereshkov

umka-lang
umka-lang copied to clipboard

The performance is not particularly great compared to Lua

Tests

Remarks

umka-lang umka-lang copied to clipboard

The performance is not particularly great compared to Lua

Tests

Remarks

umka-lang
umka-lang copied to clipboard