[BREAKING] Extend functionality of Range (to cover factorial variants) and make subclass of Matrix
This is a work-in-progress, proof-of-concept PR that allows one to calculate e.g. the 10th rising factorial of bignumber 8.5 by math.prod(math.range(math.bignumber(8.5), math.bignumber(19.5)). The major highlights include:
-
Making Range an implementation of Matrix. (This change breaks important ground to pave the way for alternate Matrix implementations besides DenseMatrix and SparseMatrix, given that there have been discussions of flat matrices and ones based on JavaScript's type arrays.)
-
Allowing Range to encode arbitrary arithmetic progressions of arbitrary mathjs entities.
-
Complete reimplementation of prod.
-
Several indexing extensions, and addition of new scalarDivide(V, W), one(x), and zero(x) methods.
In addition, there is a significant reduction in the number of failing doc example tests.
This PR is being submitted for feedback on the approach and scope. It needs at least extensive addition of tests and documentation before being merge-ready.
The most "controversial" change in this PR is that I couldn't wrap my mind around the following: previously, if you took the exact same string, say '2:5', and passed it through either math.parse() or Range.parse(), you got different results -- the former included 5 and the latter did not. That was just so confusing, since both functions were called "parse", that I changed it, so that they both include 5. That change does make this a breaking PR.
On the other hand, I left math.range('2:5') to not include the 5 because (a) it doesn't say it is "parsing" the string, it's just a function argument, (b) that way it matches math.range(2, 5), and (c) there is already a final boolean argument includeEnd to math.range that gives you explicit control over whether the 5 ends up in there.
Also I just noticed that the extensions to Range in this PR include all of the functionality of linspace in #1207 and so this PR would essentially supersede that one except for logspace (which can of course be done with new Range({from: log(start), to: log(end), for:10}).map(exp)) so maybe we could finally close that PR.
Thanks for working out this POC Glen!
The most "controversial" change in this PR is that I couldn't wrap my mind around the following: previously, if you took the exact same string, say
'2:5', and passed it through eithermath.parse()orRange.parse(), you got different results -- the former included 5 and the latter did not.
You're hitting the confusing differences between the JS API and the expression parser API, see docs. In the expression parser, the matrix indexes are one-based and the upper-bound of ranges is included. These differences (requiring the transform functions) are definitely confusing. We're trying to serve two different audiences which is problematic: mathematicians used to one based indexes and included range end versus programmers which are used to zero-based indexes and excluded range ends.
Some first thoughts on this POC:
- I love your idea to implement
Rangeas aMatrix, it makes sense. - This experiment arises from the question on whether the product of a range can be as fast as a specialized factorial function. I see you've created a benchmark
factorial.js. What are your thoughts so far? I see in the benchmark you're testing with bignumbers. It would be good to also benchmark numbers: I expect a bigger difference between an optimizedfactorialfunction there compared to a genericprod(range(...))in that case. - It's not yet entirely clear to me how to judge if this experiment is a success or not (taking into account base improvements, performance improvements, code complexity, breaking changes). Maybe we can come up with a list of pros and cons and give all of them a weight.
- In order to prevent breaking changes in
Range, maybe we can implement the new version alongside it with a different name likeRangeMatrix, and mark the other deprecated.
Thanks for working out this POC Glen!
The most "controversial" change in this PR is that I couldn't wrap my mind around the following: previously, if you took the exact same string, say
'2:5', and passed it through eithermath.parse()orRange.parse(), you got different results -- the former included 5 and the latter did not.You're hitting the confusing differences between the JS API and the expression parser API, see docs.
Yes, absolutely, I understand that the two are distinct with distinct rules of interpretation, and that is exactly why I felt the behavior of Range.parse() needed to change, because it is called parse() and the otherwise clear principle is that when we parse expressions they are 1-based and ranges include their ends. If the method were Range.fromString() or Range.create() or etc. I would agree that the prior behavior was the correct one. I just felt that calling it Range.parse() very much muddied the otherwise well-delineated distinction between JavaScript APIs and parsing expressions, making what was otherwise a pretty clear handling of this possibly unexpected distinction between the two APIs potentially much more confusing.
So another possible route is to deprecate Range.parse() (but it would still execute with its old behavior, just issuing a console.warn() when called) and introduce a synonymous Range.fromString() or some other name you may like. If that would not be considered a breaking change, it would make this whole PR non-breaking to the best of my knowledge.
In the expression parser, the matrix indexes are one-based and the upper-bound of ranges is included. These differences (requiring the transform functions) are definitely confusing. We're trying to serve two different audiences which is problematic: mathematicians used to one based indexes and included range end versus programmers which are used to zero-based indexes and excluded range ends.
Yup, but actually other than this example I think the distinction is handled clearly and cleanly.
Some first thoughts on this POC:
- I love your idea to implement
Rangeas aMatrix, it makes sense.
Thanks!
- This experiment arises from the question on whether the product of a range can be as fast as a specialized factorial function. I see you've created a benchmark
factorial.js. What are your thoughts so far? I see in the benchmark you're testing with bignumbers. It would be good to also benchmark numbers: I expect a bigger difference between an optimizedfactorialfunction there compared to a genericprod(range(...))in that case.
Actually benchmarking factorial(number) won't be informative: factorial overflows number at such a low value for the argument that the time it takes is essentially always trivial for any useful cases. In fact, if you look at stdlib.js, the implementation of factorial is solely a table lookup, since there are so few valid values.
My personal thoughts so far are that the fact this PR gets prod(range()) to half the speed of hand-coded factorial() makes it plenty good for the less-used variants of factorial, so we should just continue to use hand-coded factorial() and in its documentation suggest prod(range()) for rising, falling, and double factorials, etc. The gamma function, which is the immediate use case for both rising and double factorials, is not rate-limited by those variant factorials. Of course, every bit would help, but the factorials are not creating the bulk of the work of gamma by any means.
- It's not yet entirely clear to me how to judge if this experiment is a success or not (taking into account base improvements, performance improvements, code complexity, breaking changes). Maybe we can come up with a list of pros and cons and give all of them a weight.
Not sure about the weights, but here's my list of evaluation points for this PR (leaving aside whether it is a breaking change, since hopefully that can be adjusted either way you like):
Cons:
- Any functions on which the new extended Range depends (which is a fairly large list because it handles a lot of calculations in dealing with arbitrary arithmetic sequences) cannot depend on the convenience constructor
matrix()since the latter clearly must depend on all implementations of the Matrix interface, now including Range, and so would create a circular dependency. (Note I think pocomath-style approaches would negate this point, since I don't think there is any single implementation that has a circular dependency. Moreover, the resolution of this concern in the submitted PR was not terrible, it just involved making those functions directly depend on DenseMatrix and occasionally SparseMatrix rather than the generalmatrix()method. In a very few cases, likeconcat, breaking the circularity was tricky, requiring careful use of the ability for one instance of a Matrix implementation to create other like instances, without importing the genericmatrix()constructor.) - It involves changing the typed-function resolution of the Matrix type to come after the DenseMatrix and SparseMatrix type. Conceptually, that's where it should have been all along, since in general it's best to test for more general types after more specific types, otherwise the specific type test can never fire. So I think this change is actually positive, and it did not invalidate any unit tests, but it is a subtle and potentially unexpected side effect seemingly unrelated to the main motivation, although so far as I can see completely necessary for the PR to reproduce prior functionality. Basically, with three Matrix types instead of two, robust type resolution becomes significantly more important.
- Touches very many source files, primarily because of the first bullet point, and secondarily because Range does crop up in a number of places.
Pros:
- Speeds up prod(range()) by a factor of five, on cases which prod(range()) previously handled.
- Proves that the Matrix infrastructure is robust enough to handle a third implementation (after DenseMatrix and SparseMatrix), easing the way for one or more of the proposed other new implementations (e.g., based on TypedArrays) in addition.
- Greatly extends the collection of sequences that can be represented by Range.
- Makes things like
far more performant and less memory intensive (no million-entry array is ever allocated). In general, the lazy approach to a commonly-used sort of matrix, in which entries can be quickly generated on the fly rather than each explicitly stored in memory, seems like an improvement in power and efficiency. Note for example, currentlymath.range(1, 1000000).forEach(num => console.log(`${num} squared is ${math.square(num)}`)math.zeros()has a very efficient alternative, to return a SparseMatrix. This PR would provide a similarly efficient representation formath.ones()as a Range:math.ones([x])could benew Range({from: 1, for: x, by: 0})andmath.ones([x, y])could benew Range({from: math.ones([y]), for: x, by: math.zeros([y]))(and so on for any dimension). - Makes
math.range(...)more intuitive in that it returns a Range rather than a DenseMatrix. - Adds the new useful
scalarDivide()function, and thezero()andone()functions that are handy for type-independent computation (note they are in pocomath from the very ground up, for this very reason). Of course, all three could be done completely independently from overhauling Range. - Covers most of #1207
Difference (that I don't see as either pro or con):
- Changes the "preferred" options for specifying a Range to
{from:..., for: ..., by: ..., to: ..., til:...}withto:an inclusive limit andtil:an exclusive limit, with appropriate defaults and inference of options that are determined by the others. (They were formerlystart:, end:, step:withend:an exclusive limit, and these are still supported for backwards compatibility.) I think the new options are readable and intuitive, and they add the ability to directly specify the number of terms if you like, rather than always having to compute the ending value, and with to: vs. til: you can be explicit about including the limit or not. There was nothing really wrong with the old options, but to extend the options in the two ways mentioned it was easiest to implement a whole new set, avoiding name conflict, and then just re-do the old parameters in terms of the new ones.)
- In order to prevent breaking changes in
Range, maybe we can implement the new version alongside it with a different name likeRangeMatrix, and mark the other deprecated.
See above; I think this PR is only breaking because of the one change to Range.parse(), so if it is important it be non-breaking, hopefully we can finesse that. Otherwise, the new Range is intended to simply be a conservative extension of the behavior/capabilities of the old Range, so I don't see enough motivation to deprecate and replace.
I mean, there are cases in which Range threw before because it couldn't handle Fractions, say, but I don't think extensions to capabilities are generally considered breaking.
Oh, hmmm, I guess maybe there are also some cases in which Range formerly did some automatic conversion internally, like from bigint to number, that it doesn't do anymore. I handled any such cases in the indexing code, coercing the result of fetching an entry of a Range into number when necessary, rather than making all ranges be a range of numbers. That's on a fine line to me of whether it is a breaking change or not. I guess there might be some cases that did not arise in the unit tests in which before you could construct a Range with bigints (or maybe even BigNumbers) but then its entries would come out as regular JavaScript numbers -- now the entries are the same type as the Range was constructed with, for sure. Not sure if this change is significant, and in fact I am not 100% certain it did change; as I said, I don't see anywhere it affected the unit tests.
Looking forward to your call on this.
I felt the behavior of Range.parse() needed to change, because it is called
parse()and the otherwise clear principle is that when we parse expressions they are 1-based and ranges include their ends.
Ahh, yes I understand. Ok this makes sense. How about we just deprecate Range.parse? I'm not sure if it actually serves a use case.
I think though that it will be best to mark this PR as a breaking change anyway, since function range will now return a Range instance instead of DenseMatrix. Even when Range implements the Matrix interface, and you also mention subtle changes in the ordering of the Matrix type for example.
Actually benchmarking factorial(number) won't be informative: factorial overflows number at such a low value for the argument that the time it takes is essentially always trivial for any useful cases.
Ok that makes sense.
My personal thoughts so far are that the fact this PR gets
prod(range())to half the speed of hand-codedfactorial()makes it plenty good for the less-used variants of factorial, so we should just continue to use hand-codedfactorial()and in its documentation suggestprod(range())for rising, falling, and double factorials, etc.
That sounds good to me. It's very flexible to use prod and range. Half-speed is indeed decent.
Thanks for writing out the pros/cons. You've convinced me that the pros outweigh the cons, so I it would be awesome if you can work out this WIP further!
I still have to get used a bit to the new property names {from:..., for: ..., by: ..., to: ..., til:...}. They are indeed readable and it's a good idea to make the properties an object. But the naming isn't (yet) consistent with other places in mathjs where we're using ranges, so that can be confusing. And as far as I know, most languages use names start, step, and end for ranges, so it may be good to align with this common naming. How about something like: { start, step, count }, { start, step, endIncl }, { start, step, endExcl }?
Ahh, yes I understand. Ok this makes sense. How about we just deprecate
Range.parse? I'm not sure if it actually serves a use case.
OK, I will modify this PR to leave the behavior of Range.parse alone but mark it as deprecated (meaning in part that the first call to it will issue a console.warn of the deprecation, correct?) Do we need a "deprecation schedule" anywhere that shows all currently deprecated features, when they were deprecated, and when they are slated for discontinuation?
I think though that it will be best to mark this PR as a breaking change anyway, since function
rangewill now return aRangeinstance instead ofDenseMatrix. Even whenRangeimplements theMatrixinterface, and you also mention subtle changes in the ordering of theMatrixtype for example.
OK, as you like. I will change this to be a merge into the v16 branch (which is no real change at the moment, since I just created that branch), and add it to the tracking list.
My personal thoughts so far are that the fact this PR gets
prod(range())to half the speed of hand-codedfactorial()makes it plenty good for the less-used variants of factorial, so we should just continue to use hand-codedfactorial()and in its documentation suggestprod(range())for rising, falling, and double factorials, etc.That sounds good to me. It's very flexible to use
prodandrange. Half-speed is indeed decent.
OK, I will adjust the documentation of factorial accordingly, and consider this PR to resolve the need for variant factorials in #3583.
I still have to get used a bit to the new property names
{from:..., for: ..., by: ..., to: ..., til:...}. They are indeed readable and it's a good idea to make the properties an object. But the naming isn't (yet) consistent with other places in mathjs where we're using ranges, so that can be confusing. And as far as I know, most languages use namesstart,step, andendfor ranges, so it may be good to align with this common naming. How about something like:{ start, step, count },{ start, step, endIncl },{ start, step, endExcl }?
Another case of "programmers vs. mathematicians" -- this sort of thing comes up for mathematicians mostly in stuff like $\sum_{i = 2}^{10} \ldots$ which would be read as "the sum of i from 2 to 10 of ...".
So how about a series of synonyms? The code already supports reading the first value of a rng either as rng.from or rng.start, for example. We could just extend all synonyms to work both for constructing and reading, and have the following table:
- 'start', 'from': the initial value of the range
- 'step', 'by': the increment of the range
- 'length', 'for': the number of entries of the range
- 'last', 'to': the inclusive end of the range (not defined on length-0 ranges, since they have no last element)
- 'end', 'til': the exclusive end of the range, normalized for reading to be be
rng.last + rng.stepwhenever those are both defined.
In addition, I would propose to make Ranges immutable in that all of the above properties would be read-only once constructed.
How does that sound? In the meantime til you have a chance to weigh in, as I have time available, I will proceed with the the items listed above, together with the necessary tests and documentation to be ready for merging -- but if you want to adjust any of the above, please just let me know and I will amend what I've done to conform. Thanks!
A related question: Right now, the new attribute-based specification of a Range (e.g. ({start: 3, step: -2, length: 5}) is allowed in the Range class constructor but not in the math.range() function. Should it be? I expect probably so, so I will add that to the list of items to get this ready for merging, unless/until I hear otherwise. Thanks.
Thanks Glen!
About the deprecation of Range.parse: I think it is enough to implement a console warning and put a comment on top of it like // deprecated since 2025-11-19 or something like that. When releasing a new major version, we can search the code base for all occurrences of "deprecated" and decide whether there are cases that we what to clean up. That has worked just fine so far :smiley:.
Thanks for creating a v16 branch.
I prefer having one set of property names and not multiple ways to do the same thing (which can give confusion). If you prefer {from:..., for: ..., by: ..., to: ..., til:...} I'm OK with that. What we can do is offer a helping hand, so if a user enters Range({ start, end }), throw an error like "Error: unknown property start, did you mean from?").
Good idea to make Ranges immutable :+1: .
About the deprecation of
Range.parse: I think it is enough to implement a console warning and put a comment on top of it like// deprecated since 2025-11-19or something like that. When releasing a new major version, we can search the code base for all occurrences of "deprecated" and decide whether there are cases that we what to clean up. That has worked just fine so far 😃.
Oops, I already made a deprecation status doc page, you will see it when I push the next commit and let me know what you think.
I prefer having one set of property names and not multiple ways to do the same thing (which can give confusion).
OK, we shall have just one set of properties.
If you prefer
{from:..., for: ..., by: ..., to: ..., til:...}I'm OK with that.
No, I only changed to be sure I had gotten all of the instances. Now that I have, it's safe to switch to whatever combination is preferred. I think you like the "CS-ish" set {start:..., length:..., step:..., last:..., end:...} and that makes the least change from v15, so I will revert to that.
What we can do is offer a helping hand, so if a user enters
Range({ start, end }), throw an error like "Error: unknown propertystart, did you meanfrom?").
If we are being more conservative with property names, I think there is no need for this. No one will spontaneously start expecting from to work. It's only if we were to introduce new property names that we would need to provide some guidance to folks who were using the old.
There was an idea (from the post-closure discussion in #3581) that we could perhaps broaden the applicability of ranges with empty end to the argument of index(), e.g. in math.evaluate('subset(M, index(2:))') and this PR could certainly be a place to address it.
However, looking at it a bit more, I see that it's not really just the missing last index that's supported in e.g. M[2:]. You can also write M[2:floor(end/2)] and other crazy stuff like that. In other words, there's a new symbol end added to the context that can be used anywhere inside the brackets. and its interpretation depends on which dimension within the brackets it appears in.
How could we support that notation for e.g., subset(M, index(end-2, end-1)) ? I see two ways:
A) Make both subset and index "raw functions" in the parser: subset would evaluate M, get the extent in each dimension, and then evaluate index in a context in which it could grab those extents, and add the correct value for end into the context for evaluating each one of its arguments.
B) Parse index(blah, blee) into an IndexNode with arguments blah and blee, rather than a FunctionNode with operation index. Similarly, parse subset(M, stuff) into an AccessorNode with M as the object parameter and the result of parsing stuff as the index parameter of the Accessor node. With this change, the end notation would work inside of index immediately, by virtue of it being implemented inside of AccessorNodes with IndexNodes in them. I did check, and subset is the only place that the result of index can be used, so this method won't harm any other usage. It's a bit unorthodox, but right now there's no other uses of IndexNode and the semantics of subset and index are identical to that of AccessorNode and IndexNode, so potentially they ought to parse into the same thing. Note that (B) is noticeably easier than (A) to implement.
There is also a more lightweight compromise option that doesn't support end, but does allow the "missing end" subset(M, index(2:)):
C) Outside of square-bracket indexing, 2: represents the infinite arithmetic sequence 2, 3, 4, ... forever (basically, a missing end gets filled in as Infinity). Although trying to index a 3-element array by [2, 3, 4, 5] would still be an error, we would allow an exception in which indexing any (dimension of) an array by an infinite sequence of indices would just go until that sequence goes out of bounds (for the first time), and ignore the remainder of the infinite sequence.
Of course, there is also
(D) Do nothing about the inconsistency betwee M[stuff] and subset(M, index(stuff)).
Option (D) is what you originally suggested in #3581.
I am game to do any one of these options in this PR, just let me know. I am not going to do any of them until I hear back. If you choose (D), I will move (A)-(C) to a new design ideas Discussion so that maybe one, or something else one of them inspires, could be implemented in the future to remove the inconsistency.
Basically, the problem is that we have two very different syntaxes for exactly the same semantics: M[stuff] and subset(M, index[stuff])). That duality creates even worse problems than the issues you raised about not wanting both synonyms from and start to work in the options argument to new Range(...). So ultimately I think it would be good if these expressions were unified as much as possible -- ideally at the abstract syntax tree (i.e. Node) level, in which case the end notation would simply be compelled to work in both cases, because they would literally have the same parse. That's the motivation behind option (B), but this perspective of analyzing the situations makes it clear that there's actually another alternative:
E) Simplify Node by eliminating IndexNode and AccessorNode altogether, using a FunctionNode with function index for the former and a FunctionNode with function subset for the latter. Implement the end notation as in (A) -- both subset and Index would (I think) have to be raw functions, so they could evaluate their arguments in an augmented context.
To be frank, I personally like (E) the best -- reducing the number of Node subtypes is a big win, in my book (fewer switch options to go through when traversing ASTs, etc. etc.)
Anyhow, as I said, let me know how you'd like to proceed and I won't do anything until I hear.
Going with start:..., length:..., step:..., last:..., end:...} sounds good :+1:
Your idea (E) is interesting, I had never thought about such an approach. You're right that making the code base simpler and more unified is always good. In this case I'm in doubt though whether it would be helpful or confusing for users. It is technically possible to make subset(M, index(end-2, end-1)) work in the expression parser. But why would you want to write subset(M, index(end-2, end-1)) over M[end-2, end-1] in the expression parser? Also, this syntax then looks valid to use in plain JavaScript, but there it will not work since end is then undefined, so that can be confusing: this special context with 2: and end only works within the context of the expression parser.
I think this is a choice between (1) keeping this notation a special syntax handled by the expression parser or (2) having a special context introduced by certain functions like subset, which is then only usable in the expression parser and not in JS. I think having function subset introducing special context can be confusing, especially when you want to implement a similar function or want to overwrite/extend subset with your own variant handling more data types or so. I think it is a powerful feature when functions are pure functions without side effects.
All in all I don't see a practical need for it. I think I prefer to keep it as it is right now: only support this syntax in the expression parser inside the index brackets [...]. So that is your option (D).
Hmm, another quandary:
There is a unit test in range.test.js (line 79) that says that the Array value of range(3, 1n, -1n) should be [3n, 2n]. On the other hand, trying to systematize Ranges as arithmetic sequences, this should be the sequence of values 3 + 0*-1n, 3 + 1*-1n, 3 + 2*-1n,... until that sequence gets to or past 1n. I have made sure that (at least internally to Range), the integer scalar multiples of a bigint are bigints (i.e., 2*-1n is -2n) as we definitely want them to be for handling a bigint step size. However, currently math.add(3, -1n), for example, is 2, not 2n. In other words, currently math.add takes a number and a bigint to a number, not a bigint.
So, for this new systematization of Range, the previous unit test of math.range(3, 1n, -1n) and the previous behavior of math.add are contradictory. I would like to maintain the systematic definition of Range, I think that's what will make it the most useful and predictable. So as this is a breaking change PR anyway, it seems to me either the behavior of math.range(3, 1n, -1n) or math.add(number, bignumber) should change so that they conform with each other.
The problem is that it's not necessarily clear which should change. addScalar.js defines only that number plus number is number, and bigint plus bigint is bigint. So the outcome of number plus bigint must be derived from the typed-function automatic conversion rules. So I conclude that the conversion bigint -> number is preferred over number -> bigint.
And that choice seems fairly arbitrary: many bigints lose precision when converted to number, and many numbers lose precision when converted to bigint. So one could make the case that there should be no automatic conversion, but adopting that principle would break a bunch of things that are expecting that you can operate on mixed numbers and bigints. So I guess looking at the sort of small values that humans "typically" use, you're "more likely" to lose precision converting number -> bigint than bigint -> number. Hence the current preference.
So frankly I don't think v16 should be second-guessing the v14 choice to prefer bigint -> number conversion. So my recommendation would be to change the defined "correct" behavior for math.range(3, 1n, -1n) to produce [3, 2] as an Array. Let me know if that is OK. In the meantime I will presume so.
OK, just push a progress update, but note I need to revert to just the {start, step, length, last, end} set of attributes, and stop supporting the {from, by, ...} synonyms. That will happen in the next commit, as soon as I can get to it.
But why would you want to write
subset(M, index(end-2, end-1))overM[end-2, end-1]in the expression parser? Well, I look at it the other way: Why would we break the symmetry betweensubset(M, index(stuff))andM[stuff]? That is to say, other than this pointM[stuff]is just syntactic sugar forsubset(M, index(stuff))so why not keep it that way?
That said, option (D) is certainly the path of least resistance and since this PR is pretty massive already, I will go ahead with the leave-it-be strategy and just open a discussion.
The broken browser test has uncovered subtle bugs in typed-function. So PR 171 there is now a blocker for this PR, and if/when that change to typed-function is adopted (perhaps with changes per review), this PR will have to include updating to the new version of typed-function. Thanks.
So my recommendation would be to change the defined "correct" behavior for math.range(3, 1n, -1n) to produce [3, 2] as an Array. Let me know if that is OK. In the meantime I will presume so.
Yes, totally agree! Good point. That will make the behavior consistent with other functions like math.add(3, -1n), and aligns with the type conversion bigint -> number which is indeed preferred over number -> bigint. It will be a breaking change.
OK, upgrading to typescript 4.2.2, the browser tests now pass. However, the node tests now fail with the tree-shaking bundle size at 135123 bytes, or 123 bytes over the limit in the test. I am very unsure how to address this. @josdejong are you able to provide any guidance on how to get the bundle size back down? The test file has this advice:
// Note that if this tree-shaking test fails, there is probably
// new functionality which forces Webpack to turn off tree shaking.
//
// Typical solutions to get tree-shaking working again are:
//
// - move code into a separate file to isolate it
// - add /* #__PURE__ */ when creating a variable
I would be very happy to apply this advice, and I apologize if I am being dense, but I am at a loss as to how to figure out which code needs to be moved into a separate file and/or what variables might need the PURE annotation. Thanks for any further help you can provide.
If the increase in bundle size is in line with the amount added code, then there is nothing wrong and we just have to increment this limit a bit. Only if the size "suddenly" jumps to a much larger value then something with tree shaking may be broken.
I think you are saying that in this case it is ok just to change the limit on the test to say 137000 bytes and move on. I will do that when I get a chance. Assuming there were no other problems, I believe all that remains to mark this ready for review is to add (lots of) new tests for the new functionality and to remove the from, for, by,... synonyms.
Yes indeed, you can increase the limit a bit. Thanks, this sounds like everything is falling into place.
OK, eliminated from, for, by, to, til... As far as I know, all that is needed is a bunch of tests for all of the new features. I will get those in as soon as I can.
:muscle: sounds good
Actually I just realized that I could fold the "multiply in pairs" trick used in the hand-coded factorial into the general prod(range(...)). So now it is only 25%-35% slower than the hand-coded factorial. That's like a factor of eight speedup from the initial implementation. I'll still leave in the hand-coded factorial, since that is the most common use case and why not get the somewhat faster version for that, but now I have absolutely zero qualms about just using prod(range(...)) for all factorial variants.
Just to keep you posted: I ran into a problem adding tests (that's why we add tests of course). I added a facility whereby in math.subset, you don't have to bother calling math.index, you can just pass the arguments to index() directly as a list (i.e., Array) in the second argument to math.subset.
But this now doesn't work in the expression parser because by default, array expressions are interpreted as matrices, and while math.subset([[1, 2, 3], [4, 5, 6]], [1, [2, 0]]) works and produces [6, 4], math.evaluate('subset([1,2,3; 4,5,6], [2, [3, 1]])') throws an error because it tries to interpret [2, [3, 1]] as a Matrix but it is not a rectangle. Not really sure how to deal with this; does the expression language maybe need some convenient way to denote a regular Array even when config.matrix is set to Matrix? Will continue to ponder.
OK, to deal with this situation where sometimes you want a plain array even with the default config, and you want a ragged one so you can't make a matrix and convert it to Array, I went ahead and implemented the list notation (1, 2, 3) in the expression parser to return an Array (regardless of config). This change was remarkably easy and didn't appear to disrupt anything, and provides another nice response to inquiries like #3550. Trailing commas are allowed (and in fact are the only way to get Arrays with just one entry) and () produces the empty Array. I then used this notation to test the new subset transform (shaking out a bug in the process). I worry you may find this change unwarranted, but think it over, it actually seems quite nice to me -- it's natural to think of (x, y, z) as a tuple or list and plain JavaScript arrays play that role well.
[As an irresistable but not-particularly-serious aside into territory we've skated around multiple times now, with this change, if in addition we were to define the implicit multiplication of a function f and an array A to be f.apply(this, A), we would then no longer have any need to parse function-call syntax, it would just fall out as a consequence of all the other rules.]
OK, with the addition of a raft of new tests, I think this comprehensive overhaul to the Range class, the math.range() function, and to the math.prod() function is ready for full review. I think it creates a solid base for the high-precision gamma function #3583, which I will rebase onto v16 once this is merged. Looking forward to your feedback.
This is an impressive piece of work Glen, thanks!
Actually I just realized that I could fold the "multiply in pairs" trick used in the hand-coded factorial into the general prod(range(...)). So now it is only 25%-35% slower than the hand-coded factorial. That's like a factor of eight speedup from the initial implementation. I'll still leave in the hand-coded factorial, since that is the most common use case and why not get the somewhat faster version for that, but now I have absolutely zero qualms about just using prod(range(...)) for all factorial variants.
That is huge :sunglasses:
I haven't gone thought the PR in full detail, but here some first thoughts/questions:
- Thanks for improving the documentation under
reference/classes, much more clear this way! - I tried out
m3 = math.ones(3, 4, 'range')from the docs, and then I tried to turn the data into a JavaScript array usingm3.toArray(), but that returns an array containing ranges. Should that not return a nested array containing numbers? Also, I tried print the contents usingm3.toString()but that throws an error. - I would love to see a few basic usage examples of
Rangein the docs about matrices, like the examples in the comments ofsrc/type/matrix/Range.js. - Is there a use case for multi dimensional ranges like
math.range([1, 11, 21], [4, 14, 24], [1, 1, 1])? Do we actually need this? - What is the reasoning for changing the behavior of
matrixFromto return an array instead of a Matrix in case of mixed input? - Can the JSON
reviverstill load Ranges that are saved in the old format? - About
subsetnot supporting Matrix input, triggering you to implement support for a new array syntax in the expression parser: isn't it possible to use add support forMatrixinput in functionsubset? In that case there is no need for a new notation like(1, 2, 3), and functionsubsetbecomes more flexible then. What use-case does the notation(1, 2, 3)address besides the direct cause ofsubset? I need some time to think through the implications of introducing such a new syntax for arrays (will it be confusing to the user to have two ways to create an array? will it result in more conflicts in the parser?)
- Glad you like it.
- Yes, toArray should apply recursively, my oversight, will fix.
- Will add.
- Sure.
math.ones(1000000,1000000)andmath.range({start: math.range(1,11), step: 10, length: 10})giving a 10x10 matrix of the numbers 1 to 100 for starters. Also dividing the segment between two points in 3d into n equal segments withmath.range({start: pt1, end: pt2, length: n}). (Where pt1 and pt2 are vectors with three entries.) Lots of potential uses. - Did I do that? I am pretty sure I changed it the other way, from array to Matrix in case of mixed input, to conform with all other mathjs matrix-handling functions. A breaking change is an ideal opportunity to change to conformity here.
- Great question on the reviver. I will check and add backwards-compatibility code if not. Should the backwards compatibility if needed be deprecated or permanent?
- The basic need that the new notation fills is there is currently no way in the default configuration to notate a ragged array. The square brackets dictate rectangularity, so creating a Matrix and converting to Array can't do it. And that's appropriate for a Matrix notation, which is primarily what square brackets are in mathjs. Also when calling outside functions (such as the Chroma library in our case) you really need arrays, but you may very much want to keep using matrices by default for mathjs calculations. So I think that round brackets for arrays (conceived as lists) and square for Matrices is very natural. All instances were syntax errors before, and all unit tests passed with no modifications, so there is no syntactic conflict.
OK:
-
Excellent catch, fixing toArray() and toString() caught a decent handful of little bugs which have been fixed and now have their own unit tests.
-
Added some more range examples to docs/datatypes/matrices.md. Was there anywhere else you wanted them?
-
I verified that indeed this PR changes mixed inputs to matrixFromRows or matrixFromColumns to produce a Matrix, which seems to conform to all other matrix-handling functions; formerly they produced an Array with mixed inputs, which seemed nonconformant.
-
Yes, old-format JSON strings still produce the proper Ranges. I checked the JSON string in the example in the old comments on Range.prototype.fromJSON().
Looking forward to further review. I would actually suggest you review and merge #3606 first into develop, as then I will rebase this and add per-function HISTORY for all of the many functions touched by this PR, which will get us to a pretty reasonably high coverage of functions with their own HISTORY.