mathjs icon indicating copy to clipboard operation
mathjs copied to clipboard

Extend factorial to rising and falling factorials

Open gwhitney opened this issue 2 months ago • 10 comments

In pursuit of a better gamma function to improve #3532, in addition to the Bernoulli numbers #3551, we will need rising factorials $x^{(n)} = x\cdot(x+1)\cdot \cdots \cdot (x + n -2)(x + n -1)$, where $x$ can be any number. (And as long as we are doing those, by symmetry we should have falling factorials $(x)_n = x\cdot(x-1)\cdot \cdots \cdot (x - n + 2)(x - n + 1)$, which is trivial because they are related by $(x)_n = (x-n+1)^{(n)}$.)

So some questions:

  • Are you comfortable extending factorial(n) with a new signature factorial(x, n) where basically x is any numerical value and n is any kind of integer, and when n is positive you get the rising factorial and when it is negative you get the falling factorial? Or do you prefer to have new functions - either two of them risingFactorial and fallingFactorial, or maybe squash them into one name like shiftingFactorial that works with positive or negative n? [For the record, I slightly prefer to have just the one name, factorial, that can be used for all three.]

  • Once rising factorials are in place, a very convenient way in terms of code simplicity to implement factorial(n) is just as the rising factorial $1^{(n)}$. The naive implementation of rising factorials is iterated multiplication. The first pass better than this is "binary splitting" where you pick h about half of n when n is over some small threshold and compute $x^{(n)} = x^{(h)}\cdot(x+h)^{(n-h)}$. That's actually already a significant improvement over the naive, because you tend to do many more "easy" multiplications and fewer "hard" ones. However, we currently have this special trick for factorial of BigNumber that cuts the total number of multiplications in half by exploiting the fact that triangular numbers are proportional to $n(n+1)$. (There are even more sophisticated methods but I think they are overkill for mathjs.)

However, if we switched to this convenient definition of factorial in terms of rising factorial, we would end up with the following benchmarks:

bigFactorial for 8            1.26 µs   ±5.95%
new bigFactorial for 8        0.80 µs   ±1.18%
split factorial for 8         1.10 µs   ±8.00%
bigFactorial for 20           3.06 µs   ±3.25%
new bigFactorial for 20       2.10 µs  ±23.63%
split factorial for 20        2.83 µs   ±4.25%
bigFactorial for 600        220.03 µs   ±0.85%
new bigFactorial for 600    110.35 µs   ±0.66%
split factorial for 600     126.93 µs   ±0.73%
bigFactorial for 1.5K       545.71 µs   ±0.53%
new bigFactorial for 1.5K   273.70 µs   ±0.63%
split factorial for 1.5K    312.90 µs   ±1.59%
bigFactorial for 10K       3633.01 µs   ±0.38%
new bigFactorial for 10K   2454.87 µs   ±0.49%
split factorial for 10K    2223.74 µs   ±0.63%
bigFactorial for 30K       11066.99 µs   ±0.41%
new bigFactorial for 30K   8634.30 µs   ±2.67%
split factorial for 30K    8307.91 µs   ±0.23%

So you can see that the split factorial (i.e. based on the binary-splitting rising factorial) actually outperforms the current clever trick for large factorials, even though it's doing twice as many multiplies -- it's just mostly doing much easier ones. (I didn't search for the breakeven point, maybe it's a something like 5000!.) But for intermediate factorials in the 600! to 1500! range, the split factorial is about 15% slower. Would that sort of slowdown be acceptable for the sake of code simplicity, or do you want code which checks the number of terms and if it's in that middle range it uses the current trick and at some threshold it switches to the split factorial? [My personal take is that Javascript is never going to actually be a high-performance engine, so it's not worth convoluting our code for modest speedup in intermediate cases, but I am open to counterpoint.]

Note that in this refactor, if we go ahead and base factorial on the rising factorial, then it will make sense to move the factorial code into the factorial function, rather than in the gamma function where it is now. I actually think that's an organizational improvement, since at the moment if you call factorial on a bignumber, it delegates to gamma, which right now only works for integers and internally just calls a function bigFactorial, which seems kind of needlessly roundabout. The eventual idea is that gamma will work for BigNumbers in general, not just integers, and depend on the rising factorials, whatever they are called, and so there won't be any point in the factorial implementation being in the gamma.js source file, whatever it is.

I will hold off on implementing rising factorials and the more complete gamma function until I get some feedback on the rising factorial syntax and factorial implementation questions above. Thanks.

gwhitney avatar Oct 12 '25 18:10 gwhitney

For the record:

  • Sympy has separate factorial, RisingFactorial, and FallingFactorial functions.
  • Mathematica unifies the rising factorial and the reciprocal of the falling factorial (with an index shift of 1, to keep the formula related to the gamma function consistent) into the Pochhammer function, which is separate from the Factorial function. So another strategy for us would be to have a factorial function and a pochhammer (x, n) function which is rising factorial for positive n and reciprocal of falling factorial for negative n.
  • Maple has factorial and pochhammer.

I did not find an existing package that had a two-argument form of the factorial function for computing rising and falling factorials, but to me it is a natural extension since they are all called "factorials," (just different kinds of factorials). On the flip side, if we go this way, I think someone not familiar with mathjs seeing factorial(x, 5) wouldn't likely jump to the intuition that this must mean the rising factorial; I think they would say "What could a second argument to factorial mean?" and have to look up the documentation and then say "Oh, rising factorial, that's reasonable."

However, it's also worth keeping in mind that there is also the double factorial 7!! = 7·5·3·1 (and 8!! = 8·6·4·2) which is currently not implemented in mathjs. This is another kind of factorial, so potentially could create confusion about what a second argument to factorial means, and there's no nice way I see to also fold double factorials into a factorial function except possibly for an options argument instead of a numeric second argument, e.g. factorial(7.5, {rising: 3}) would be a rising factorial, factorial(bignumber(10.2), {falling: 6}) would be a falling factorial, and factorial(8, {step: 2}) would be the double factorial. This would allow for exotic hybrids like factorial(7.5, {falling: 4, step: 3}) for 7.5·4.5·1.5·-1.5 -- not that I personally have ever seen e.g. "rising double factorials" used.

But at this point, all these things are starting to look just like prod(7.5:-3:-1.5) which already works and computes the exact same thing. In other words, one can already get the rising factorial $x^{(n)}$ via prod(x:x+n-1) (and we could define factorial by prod(1:n)). The problem is it's super slow:

bigFactorial for 8            1.27 µs   ±7.26%
new bigFactorial for 8        0.85 µs   ±2.13%
split factorial for 8         1.19 µs  ±10.89%
prod range for 8             19.88 µs   ±7.64%

bigFactorial for 20           3.35 µs   ±3.96%
new bigFactorial for 20       2.07 µs   ±5.67%
split factorial for 20        3.25 µs   ±7.81%
prod range for 20            42.02 µs   ±1.89%

bigFactorial for 600        229.99 µs   ±0.92%
new bigFactorial for 600    115.48 µs   ±0.96%
split factorial for 600     134.65 µs   ±1.18%
prod range for 600         1265.41 µs   ±1.41%

bigFactorial for 1.5K       578.55 µs   ±1.30%
new bigFactorial for 1.5K   289.71 µs   ±1.06%
split factorial for 1.5K    316.15 µs   ±1.20%
prod range for 1.5K        3185.78 µs   ±1.25%

bigFactorial for 10K       3819.71 µs   ±0.49%
new bigFactorial for 10K   2546.94 µs   ±0.94%
split factorial for 10K    2358.69 µs   ±1.07%
prod range for 10K         23548.06 µs   ±9.22%

bigFactorial for 30K       12894.51 µs   ±0.36%
new bigFactorial for 30K   9807.16 µs   ±1.52%
split factorial for 30K    9770.07 µs   ±0.44%
prod range for 30K         79568.70 µs   ±1.64%

I believe the difficulties are that prod range uses the naive algorithm and it allocates and deallocates a temporary matrix that gets big, which is pretty heavyweight. Nevertheless, these observations mean there is another possible direction to go, because mathjs does already have a little-utilized Range object which is a "lazy" range that never produces a matrix unless you ask it to:

  1. Adjust the semantics so that new Range(bignumber(1.5), bignumber(4)) lazily represents bignumber(1.5), bignumber(2.5), bignumber(3.5) rather than 1.5, 2.5, 3.5 as it currently does. This is a breaking change, but likely not a big deal because Ranges are rare in practice.
  2. Either adjust the range function to return a Range object rather than a solid Matrix/Array, or provide another easy-to-access function with essentially the same arguments that produces a Range object (although it's a breaking change, I would recommend the former). Note in this step we might want to deal with the inconsistency that the string notation for Range(4, 10, 2) is "4:2:8" -- both the argument order and the end boundary condition differ, which seems really unfortunate to me... but that would indeed be a significant breaking change. I just kind of hate the double inconsistency, not quite sure how it arose or if maybe there is a rationale for it I am not aware of.
  3. Specialize various functions to work on Range objects directly, e.g. sum and especially prod, where we would use a binary-splitting algorithm (if the Range is longer than some threshold, construct two Ranges that exhaust it, recurse on them and take the product of the results).

I don't see any reason why down this path, we couldn't in principle get "prod range" to be essentially as fast as the "split factorial" prototype. So in this scenario, there wouldn't be any reason to add additional functions or signatures; everything could be done with prod(range(...)), and in particular factorial(n) could either just be implemented as prod(range(1, n+1)) [modulo whatever we decide on the end boundary convention], or the current implementation for specifically factorial could be left alone, foregoing the mild improvement for very large factorials in favor of the mild better performance currently on intermediate factorials.

Going even a little further down this path, it should be possible for Range to actually just be an additional implementation of the Matrix interface, so that it should be able to drop-in "just work" lazily for essentially any Matrix operation. The Range implementation would be limited to 1d unless we decided to allow something like range(range(1,4), range(4, 7)) for [[1, 2, 3], [2, 3, 4], [3, 4, 5]]. (The hyper-extended version of this would have even a map on a Range remain a lazy entity, that generates transformed entries on demand, but I can't imagine going that far down the road of functional programming at the moment.)

Wow. Really sorry to flood the zone here. I didn't realize there would be such a range of possibilities (pun intended) for how we might handle something as seemingly innocuous as rising factorials for the sake of extending gamma for the sake of fixing zeta. So more than ever, I need some thought/feedback as to which direction seems best for mathjs to go. I suspect you can notice that at the moment, I am enthusiastic about a bigger role for lazy ranges in mathjs -- I suspect they would have at least a few positive performance impacts.

Thanks in advance for your insight and guidance.

gwhitney avatar Oct 12 '25 21:10 gwhitney

As an aside, all the referenced factorial functions are available in stdlib. While they don't support big numbers, one could probably write small wrappers.

kgryte avatar Oct 13 '25 09:10 kgryte

Thanks! That's an impressive library of functions! It seems certain that mathjs's handling of numerous functions on IEEE double-precision floats could be improved/extended by using, wrapping, or emulating stdlib. (@josdejong perhaps we should discuss that sometime?) For example, the current effort is motivated by the zeta function, which in turn depends on gamma, both of which are in stdlib.

Now, mathjs already has implementations of zeta and gamma on doubles. I wouldn't be surprised if stdlib's are faster and/or more accurate on doubles. But the current effort is to extend them to be able to achieve 320 decimal digits of accuracy, with the decimal.js library. (Again, @josdejong, do you know why we are putting that limit of 320? I don't know where it's documented, and it should be...) I don't see that the very refined/sophisticated methods in stdlib extend to that type of computation, since they seem very optimized/tailored for the specific precision of doubles (e.g. the computation of ordinary factorial is precisely a lookup table).

Happy for your thoughts/assistance, thanks for your interest.

gwhitney avatar Oct 13 '25 14:10 gwhitney

Thanks for picking this up Glen!

  1. Sounds good to keep 1 function factorial and extend it with a new arugment. However, maybe it is better to make this and enum instead of an arbitrary number, so that it is explicit what's going on. For example:

    factorial(x)
    factorial(x, options: { mode: 'rising' | 'falling' | 'shifting', step: number } )
    

    Do I understand it correctly that with this numeric step value, you would let the factorial take other steps than one? Like factorial(5, {step: 2 }) is 5*3*1? If step is not a thing, we could make the second argument an enum like:

    factorial(x, mode: 'rising' | 'falling' | 'shifting' )
    

    Maybe in the documentation we can mention alternative names, like "These function modes are sometimes called risingFactorial, fallingFactorial or pochHammer". What way, when people search the documentation, they will easily find the right function.

  2. I think in general we should go for the simplest to read and maintain code, and only add extra code for performance optimizations when we have benchmarks that show that the added code actually yields significant performance improvements and hence is worth the added code and complexity.

    I think a performance difference of 15% in just some ranges is so small that it is not worth it, thanks for benchmarking this. I would prefer code simplicity in that case. The performance improvement of your new implementation is great!

  3. Interesting idea to see if we can improve on the performance of prod to use it to calculate a factorial (I guess we would indeed have to have lazy evaluation instead of creating intermediate arrays). But maybe we should park that as a separate discussion/idea, and focus here on extending factorial itself. It feels to me like with relatively little extra code we can make factorial much more flexible and powerful, I think we should just do that.

  4. I'm totally fine with using functionality from for example stdlib where applicable instead of reinventing the wheel.

  5. I have no clue about this 320 digits limit that we're running into in #3532. I do know that the decimal.js library contains a few hardcoded values like pi with 1025 digits, see code: https://github.com/MikeMcl/decimal.js/blob/master/decimal.js#L29-L32. First step would be to figure out if this is some limit introduced in mathjs or in decimal.js.

josdejong avatar Oct 15 '25 09:10 josdejong

I guess the main question I ended up with is which of the following two is better:

  • Improve prod(range(x, y, z)) to be as fast as binary-splitting factorial. There is no reason it can't be. Then don't bother implementing any fancy variants of factorial at all, because rising factorials just become things like prod(range(5.5, 8.5)) and falling factorials become things like prod(range(5.5, 2.5, -1)) and double factorials become prod(range(8,1,-2)) so you can easily and clearly get any of the variants with no new functions, as well as other similar variants that are not quite the usual named ones. (And to boot, factorial(n) just becomes an alias for prod(range(2, n)) -- what could be simpler than that?)

  • Provide additional special-purpose functions, or options to factorial, for falling, rising, and double factorials, and leave range and prod as they are.

I will go whichever way you prefer, but my heart is definitely more in the first alternative, because I feel like it leaves mathjs in a state that is both more powerful (the range and prod improvements will benefit other computations as well, and more variants of factorial are available in a more systematic way that leverages existing mathjs notations) and less cluttered (no new functions nor complicated options to an existing function). Just let me know.

gwhitney avatar Oct 16 '25 07:10 gwhitney

4. I'm totally fine with using functionality from for example stdlib where applicable instead of reinventing the wheel.

I think this is a completely different discussion, to be had elsewhere. As a whole, stdlib is huge (ten times the size of mathjs), but we can pick and choose pieces. It is also very focused on optimizing performance/accuracy on doubles (i.e. ordinary javascript numbers) and floats, rather than on a whole range of types like mathjs. But I could imagine a world in which many of mathjs's implementations on numbers forward to stdlib. That would greatly increase the bundle size of mathjs, unless stdlib is (for example) loaded on the fly from a CDN, say, rather than included in the bundle... But as I said, this whole discussion should be moved into its own appropriate place.

gwhitney avatar Oct 16 '25 07:10 gwhitney

Agreed that we should move potential stdlib integration to a separate discussion. There are ways to reduce to the bundle impact; however, in general, stdlib's implementations, such as betainc, can be quite involved in order to preserve accuracy across the entire range of acceptable input values. For situations where accuracy is not as critical, we have a "fast" math namespace where the implementations are much lighter weight.

kgryte avatar Oct 16 '25 10:10 kgryte

Nicest would indeed be if prod(range(x, y, z)) would be so fast that there is no need for specific optimized factorial variants. So I think it makes sense then to first investigate if we see room for improvement there. It would also be interesting to see how this would work out in the pocomath POC which has a faster/improved type dispatcher that may already be better at this out of the box.

josdejong avatar Oct 22 '25 12:10 josdejong

So I think it makes sense then to first investigate if we see room for improvement there.

Yes, that is exactly what I have been working on the past week...

gwhitney avatar Oct 22 '25 15:10 gwhitney

OK, I just submitted a WIP pull request #3568 so you could take a look at where I have gotten. It turned out to be a fairly major reworking of ranges and indexing. You will need to take a look to see what you think. I also updated the factorial benchmark to use prod(range(bignumber(1), bignumber(n+1)) as an alternative computation method. Here's the results:

bigFactorial for 8            1.31 µs   ±6.32%
new bigFactorial for 8        0.90 µs   ±0.05%
split factorial for 8         1.79 µs  ±11.19%
prod range for 8              8.48 µs   ±6.09%
bigFactorial for 20           3.44 µs   ±4.45%
new bigFactorial for 20       2.18 µs   ±5.12%
split factorial for 20        4.42 µs   ±5.38%
prod range for 20            13.10 µs   ±4.08%
bigFactorial for 600        234.39 µs   ±0.74%
new bigFactorial for 600    118.00 µs   ±1.01%
split factorial for 600     172.77 µs   ±0.97%
prod range for 600          265.63 µs   ±1.16%
bigFactorial for 1.5K       589.38 µs   ±0.76%
new bigFactorial for 1.5K   293.20 µs   ±1.16%
split factorial for 1.5K    406.55 µs   ±0.99%
prod range for 1.5K         741.86 µs   ±1.10%
bigFactorial for 10K       3886.65 µs   ±0.53%
new bigFactorial for 10K   2560.16 µs   ±1.18%
split factorial for 10K    2960.56 µs   ±0.69%
prod range for 10K         4973.57 µs   ±0.64%
bigFactorial for 30K       11677.61 µs   ±0.25%
new bigFactorial for 30K   8729.51 µs   ±1.15%
split factorial for 30K    10646.44 µs   ±0.29%
prod range for 30K         16312.09 µs   ±0.23%

Bottom line is that I achieved a factor of 5 speedup over the previous implementation of prod range, but that unfortunately means it's still only half the speed of the current factorial algorithm ("new bigFactorial"). The reason that splitFactorial no longer looks as good as "new bigFactorial" is that I changed it to allocate a new bignumber on every iteration, rather than just multiplying one bignumber by a bunch of ordinary numbers, and that does turn out to be a part of the slowdown in "prod range", as you can see, but not all of it. I am not 100% sure what the remaining slowdown in "prod range" is, as it's essentially the same algorithm as splitFactorial and I tried to make sure it does no typed-dispatch in the inner loop. Without investing in some heavy-duty profiling work (which I have never really done in JavaScript), I don't really know how I could squeeze more out of "prod range", unless you have other ideas/suggestions.

So I guess it's decision time. Should we:

  • Go down the path of PR #3568 (once its polished up and adjusted to your liking) and change factorial just to be prod range for simplicity, eating the factor of 2 slowdown?
  • Go with PR #3568, perhaps in part because it has some other advantages, and figure prod ranges are good enough for less-used rising and falling and double factorials (so they don't need their own functions), but leave the current factorial implementation itself?
  • Go with PR #3568 solely for its other advantages, and implement specialized factorial variant functions as well?
  • Abandon PR #3568 altogether and implement specialized factorial variants?

I look forward to your thoughts, feedback, and guidance.

gwhitney avatar Oct 25 '25 17:10 gwhitney