Deduplicate VM constants
(except when defining derived unit constants, as initially identical but temporary values)
Don't see any real performance changes from this (if anything it's probably a minor slowdown due to the added dupe check in add_constant). However, it does reduce the number of constants generated from the prelude from ~730 to ~320, a reduction of 56%. This in theory allows for larger Numbat programs as it's now harder to hit the ~65k constants ceiling. Worth it? I don't know :)
This looks great. I’m wondering if using an IndexSet would be possible? If it is I think it could simplify the code a bit and possibly address the (potential) performance impact you mentioned. I’m not familiar enough with it to know its limitations, so this is mostly just my musings of curiosity.
i've already thought of that, unfortunately sets would not work here due to the dummy temporary constants added for derived base units, which are identical until processed at vm runtime; with a set these dummies would be incorrectly duplicated, at least as the code is now
the performance impact is nonexistent with my simple testing but i suppose algorithmically speaking the performance wil worsen much faster the more constants you have, so perhaps we can consider somehow eliminating the need for dummies here, it would also solve the pre-existing todo
Aah, that's too bad. Thank you for the explanation though. I’m glad the impact is negligible, at currently realistic scale. With a linear time complexity, I think the trade off is worth it even at larger scales just due to the current constraint on the number of constants. It would be interesting to test the effects at scales large enough to approach the maximum number of constants without the duplication check.
to be clear i meant my previous testing (basically only the prelude)
however now i'm curious to get some results on more pathological cases so i did these rudimentary benchmarks:
$ seq 1 10000 | awk '{ print "let _" $0 " = " $0 }' > f
$ hyperfine " ./src/numbat/numbat-old f"
Benchmark 1: ./src/numbat/numbat-old f
Time (mean ± σ): 3.149 s ± 0.029 s [User: 3.128 s, System: 0.018 s]
Range (min … max): 3.100 s … 3.209 s 10 runs
$ hyperfine " ./src/numbat/numbat-better f"
Benchmark 1: ./src/numbat/numbat-better f
Time (mean ± σ): 3.219 s ± 0.029 s [User: 3.200 s, System: 0.015 s]
Range (min … max): 3.182 s … 3.267 s 10 runs
$ # same test but strings
$ seq 1 10000 | awk '{ print "let _" $0 " = \"foo" $0 "\"" }' > f
$ hyperfine " ./src/numbat/numbat-old f" "./src/numbat/numbat-better f"
Benchmark 1: ./src/numbat/numbat-old f
Time (mean ± σ): 2.008 s ± 0.027 s [User: 1.991 s, System: 0.015 s]
Range (min … max): 1.976 s … 2.066 s 10 runs
Benchmark 2: ./src/numbat/numbat-better f
Time (mean ± σ): 2.332 s ± 0.033 s [User: 2.314 s, System: 0.017 s]
Range (min … max): 2.296 s … 2.401 s 10 runs
Summary
./src/numbat/numbat-old f ran
1.16 ± 0.02 times faster than ./src/numbat/numbat-better f
so there's a slowdown for sure (more noticeable on the strings version unsurprisingly), but it's not very massive and on a pathological case, but we might be able to do some work here still. still, i'm not sure if this pull request is worth it in the first place, i'm not sure what practical benefits it actually brings
also, i'm surprised the strings version is faster in the first place! i wonder why that is
also, tangential side note: piping the awk output directly into numbat is slower by an order of magnitude than reading it from a file:
$ time seq 1 10000 | awk '{ print "let _" $0 " = " $0 }' | numbat
________________________________________________________
Executed in 21.75 secs fish external
usr time 21.66 secs 1.48 millis 21.66 secs
sys time 0.09 secs 0.00 millis 0.09 secs
definitely worth a look
That is a pretty minor impact at that scale, especially considering that the tests specifically avoid anything but the worst case scenario. In a real life scenario, if we assume that the rate of occurrence of individual constant values are similar to the prelude (~40% being numbers 0-9), we would see even lower impact due to finding these values in the beginning of the list instead of having to traverse the whole thing on every constant.
also, i'm surprised the strings version is faster in the first place! i wonder why that is
That one surprised me too. I wonder if choosing representation and handed of units (or the lack there of) is having an impact here?
That one surprised me too. I wonder if choosing representation and handed of units (or the lack there of) is having an impact here?
my guess is possibly relating to number parsing, but i'm not sure if that can account for a 1 second difference i'll take a closer look once i have the time
i've already thought of that, unfortunately sets would not work here due to the dummy temporary constants added for derived base units, which are identical until processed at vm runtime; with a set these dummies would be incorrectly duplicated, at least as the code is now
I feel like this should be rectifiable. The dummies inserted (you can search the repo for <dummy>) will be immediately overwritten via the SetUnitConstant op that occurs right after their creation (self.vm.add_op2(Op::SetUnitConstant, ...) in the same branch) — see the Op::SetUnitConstant branch of Vm::run_without_cleanup. And I think this op just uses all the information already available in the UnitMetadata. So I think dummies don't have to be created at all, and instead the correct values can just be inserted immediately. But I don't understand the inner workings well enough to be confident that this is correct.
Edit: ah, I see the difficulty. Defining a derived unit requires having the corresponding quantity on hand, which requires compiling the expression, which requires having the unit defined (even if only as a dummy). A bit of a chicken and egg problem. I can see that this is worked around by inserting Op::SetUnitConstant so that before the current statement is executed, but after the expression is compiled, the units in question (which are dummies that have been accepted by the compiler) actually end up with definitions. I feel like it should be possible to go the other way around: in compile_expression, when we find a derived unit with no definition, create it. However, plumbing this correctly sounds annoying. In the Op::SetUnitConstant branch, there is a quantity to pop from the stack, but that won't exist during compilation.
This looks great. I’m wondering if using an IndexSet would be possible? If it is I think it could simplify the code a bit and possibly address the (potential) performance impact you mentioned. I’m not familiar enough with it to know its limitations, so this is mostly just my musings of curiosity.
Another reason that an IndexSet would not be possible (or at least would be difficult) is that a number of types used in Constant don't implement Hash. Most could probably be made to be, but f64 presents an annoying (albeit solvable) challenge.
However, it should also be noted that in the current design, an infinite number of NaNs could be added to the constants registry, as they are never considered duplicates of each other.