numbat icon indicating copy to clipboard operation
numbat copied to clipboard

Deduplicate VM constants

Open triallax opened this issue 8 months ago • 10 comments

(except when defining derived unit constants, as initially identical but temporary values)

Don't see any real performance changes from this (if anything it's probably a minor slowdown due to the added dupe check in add_constant). However, it does reduce the number of constants generated from the prelude from ~730 to ~320, a reduction of 56%. This in theory allows for larger Numbat programs as it's now harder to hit the ~65k constants ceiling. Worth it? I don't know :)

triallax avatar Apr 07 '25 19:04 triallax

This looks great. I’m wondering if using an IndexSet would be possible? If it is I think it could simplify the code a bit and possibly address the (potential) performance impact you mentioned. I’m not familiar enough with it to know its limitations, so this is mostly just my musings of curiosity.

Goju-Ryu avatar Apr 08 '25 16:04 Goju-Ryu

i've already thought of that, unfortunately sets would not work here due to the dummy temporary constants added for derived base units, which are identical until processed at vm runtime; with a set these dummies would be incorrectly duplicated, at least as the code is now

triallax avatar Apr 08 '25 18:04 triallax

the performance impact is nonexistent with my simple testing but i suppose algorithmically speaking the performance wil worsen much faster the more constants you have, so perhaps we can consider somehow eliminating the need for dummies here, it would also solve the pre-existing todo

triallax avatar Apr 08 '25 18:04 triallax

Aah, that's too bad. Thank you for the explanation though. I’m glad the impact is negligible, at currently realistic scale. With a linear time complexity, I think the trade off is worth it even at larger scales just due to the current constraint on the number of constants. It would be interesting to test the effects at scales large enough to approach the maximum number of constants without the duplication check.

Goju-Ryu avatar Apr 08 '25 21:04 Goju-Ryu

to be clear i meant my previous testing (basically only the prelude)

however now i'm curious to get some results on more pathological cases so i did these rudimentary benchmarks:

$ seq 1 10000 | awk '{ print "let _" $0 " = " $0 }' > f
$ hyperfine " ./src/numbat/numbat-old  f"
Benchmark 1:  ./src/numbat/numbat-old  f
  Time (mean ± σ):      3.149 s ±  0.029 s    [User: 3.128 s, System: 0.018 s]
  Range (min … max):    3.100 s …  3.209 s    10 runs
 
$ hyperfine " ./src/numbat/numbat-better  f"
Benchmark 1:  ./src/numbat/numbat-better  f
  Time (mean ± σ):      3.219 s ±  0.029 s    [User: 3.200 s, System: 0.015 s]
  Range (min … max):    3.182 s …  3.267 s    10 runs
$ # same test but strings
$ seq 1 10000 | awk '{ print "let _" $0 " = \"foo" $0 "\"" }' > f
$ hyperfine " ./src/numbat/numbat-old  f" "./src/numbat/numbat-better f"
Benchmark 1:  ./src/numbat/numbat-old  f
  Time (mean ± σ):      2.008 s ±  0.027 s    [User: 1.991 s, System: 0.015 s]
  Range (min … max):    1.976 s …  2.066 s    10 runs
 
Benchmark 2: ./src/numbat/numbat-better f
  Time (mean ± σ):      2.332 s ±  0.033 s    [User: 2.314 s, System: 0.017 s]
  Range (min … max):    2.296 s …  2.401 s    10 runs
 
Summary
   ./src/numbat/numbat-old  f ran
    1.16 ± 0.02 times faster than ./src/numbat/numbat-better f

so there's a slowdown for sure (more noticeable on the strings version unsurprisingly), but it's not very massive and on a pathological case, but we might be able to do some work here still. still, i'm not sure if this pull request is worth it in the first place, i'm not sure what practical benefits it actually brings

also, i'm surprised the strings version is faster in the first place! i wonder why that is

triallax avatar Apr 08 '25 21:04 triallax

also, tangential side note: piping the awk output directly into numbat is slower by an order of magnitude than reading it from a file:

$ time seq 1 10000 | awk '{ print "let _" $0 " = " $0 }' | numbat

________________________________________________________
Executed in   21.75 secs    fish           external
   usr time   21.66 secs    1.48 millis   21.66 secs
   sys time    0.09 secs    0.00 millis    0.09 secs

definitely worth a look

triallax avatar Apr 08 '25 21:04 triallax

That is a pretty minor impact at that scale, especially considering that the tests specifically avoid anything but the worst case scenario. In a real life scenario, if we assume that the rate of occurrence of individual constant values are similar to the prelude (~40% being numbers 0-9), we would see even lower impact due to finding these values in the beginning of the list instead of having to traverse the whole thing on every constant.

also, i'm surprised the strings version is faster in the first place! i wonder why that is

That one surprised me too. I wonder if choosing representation and handed of units (or the lack there of) is having an impact here?

Goju-Ryu avatar Apr 09 '25 09:04 Goju-Ryu

That one surprised me too. I wonder if choosing representation and handed of units (or the lack there of) is having an impact here?

my guess is possibly relating to number parsing, but i'm not sure if that can account for a 1 second difference i'll take a closer look once i have the time

triallax avatar Apr 09 '25 21:04 triallax

i've already thought of that, unfortunately sets would not work here due to the dummy temporary constants added for derived base units, which are identical until processed at vm runtime; with a set these dummies would be incorrectly duplicated, at least as the code is now

I feel like this should be rectifiable. The dummies inserted (you can search the repo for <dummy>) will be immediately overwritten via the SetUnitConstant op that occurs right after their creation (self.vm.add_op2(Op::SetUnitConstant, ...) in the same branch) — see the Op::SetUnitConstant branch of Vm::run_without_cleanup. And I think this op just uses all the information already available in the UnitMetadata. So I think dummies don't have to be created at all, and instead the correct values can just be inserted immediately. But I don't understand the inner workings well enough to be confident that this is correct.

Edit: ah, I see the difficulty. Defining a derived unit requires having the corresponding quantity on hand, which requires compiling the expression, which requires having the unit defined (even if only as a dummy). A bit of a chicken and egg problem. I can see that this is worked around by inserting Op::SetUnitConstant so that before the current statement is executed, but after the expression is compiled, the units in question (which are dummies that have been accepted by the compiler) actually end up with definitions. I feel like it should be possible to go the other way around: in compile_expression, when we find a derived unit with no definition, create it. However, plumbing this correctly sounds annoying. In the Op::SetUnitConstant branch, there is a quantity to pop from the stack, but that won't exist during compilation.

rben01 avatar May 13 '25 21:05 rben01

This looks great. I’m wondering if using an IndexSet would be possible? If it is I think it could simplify the code a bit and possibly address the (potential) performance impact you mentioned. I’m not familiar enough with it to know its limitations, so this is mostly just my musings of curiosity.

Another reason that an IndexSet would not be possible (or at least would be difficult) is that a number of types used in Constant don't implement Hash. Most could probably be made to be, but f64 presents an annoying (albeit solvable) challenge.

However, it should also be noted that in the current design, an infinite number of NaNs could be added to the constants registry, as they are never considered duplicates of each other.

rben01 avatar May 15 '25 16:05 rben01