reimplement Decimal and Whole in Ceylon
The current implementations are plain ugly, and Java-only. So I think we need to do this. But there are some challenges. For example, the Java impl of BigInteger uses an int[] array, which we don't really have a high-performance platform-neutral abstraction of yet.
I will make a start on this by taking a stab at reimplementing Whole in Ceylon based on an Array<Long>, since that doesn't look terribly daunting.
Hi. Just noticed ceylon.math is for Java only, which seems like a "had-no-time-to-do-it" limitation? This is really necessary for any serious development to happen in JS! If you hadn't had the time to look into it, perhaps I can give this a go (I actually really need to use Whole right now)...
Go for it!
I started on an implementation of Whole, and although it still needs a ton of work, most of the functions mostly work, so I thought I'd share what I have - https://github.com/jvasileff/ceylon-sdk/commit/095f8d24cfbcb9241ccc5417278b1a28a03659c7
The completely unimplemented pieces are dividing by one-word Wholes, a rare condition when dividing larger Wholes, string parsing, hash, and I'm sure a few others.
The word size is configurable (by editing a constant in the source), and will be 16 bits on JS and 32 bits on Java.
Awesome!
Sent from my iPhone
On 04 Dec 2014, at 05:34, John Vasileff [email protected] wrote:
I started on an implementation of Whole, and although it still needs a ton of work, most of the functions mostly work, so I thought I'd share what I have - jvasileff@095f8d2
The completely unimplemented pieces are dividing by one-word Wholes, a rare condition when dividing larger Wholes, string parsing, hash, and I'm sure a few others.
The word size is configurable (by editing a constant in the source), and will be 16 bits on JS and 32 bits on Java.
— Reply to this email directly or view it on GitHub.
Is there a good reason to keep Whole (and Decimal) as interfaces, assuming private constructors are coming? Currently I'm using ugly casts in WholeImpl, and final classes would provide better immutability guarantees. I guess sealed interfaces would be another option.
Separately, once constructors are available, are there any opinions on using constructors vs. factory functions? parseWhole and wholeNumber would be easier to find as constructors to a Whole class. The same issue is currently open for Uris, where the nondescript parse(String) is used, along with parseParameter(String) for Parameters, and potentially parsePath(String) and others. (My point is not to debate Uris here, just using it as an example.)
Is there a good reason to keep Whole (and Decimal) as interfaces
No, not AFAIK.
Separately, once constructors are available, are there any opinions on using constructors vs. factory functions?
I'm in general not a fan of factory functions, except for the case of parseXxxx() methods which accept String. I'm happy to stick with the pattern that functions are used for parsing strings.
Is there a good reason to keep Whole (and Decimal) as interfaces, assuming private constructors are coming? Currently I'm using ugly casts in WholeImpl, and final classes would provide better immutability guarantees. I guess sealed interfaces would be another option.
Basically no. The only reason those are interfaces is because we needed to have a shared type, but we didn't want to expose a class parameter list with BigInteger/BigDecimal in it.
BigInteger uses an int[] array, which we don't really have a high-performance platform-neutral abstraction of yet.
I have some very interesting preliminary results. My initial no-tricks implementation is actually beating BigInteger in many cases, with BigInteger's only clear advantage being division between two large numbers. I say preliminary, because these results are actually hard to believe. But, whatever the Array<Long> penalty is, it doesn't look too bad for this use case.
The following is for one million operations each on pre-computed random numbers, with warmup before each testing round:
Edit - removed incorrect benchmark.
Wow I did not expect that. Sounds to good to be true, but if it is true, that's excellent news!
Sent from my iPhone
On 08 Dec 2014, at 00:13, John Vasileff [email protected] wrote:
BigInteger uses an int[] array, which we don't really have a high-performance platform-neutral abstraction of yet.
I have some very interesting preliminary results. My initial no-tricks implementation is actually beating BigInteger in many cases, with BigInteger's only clear advantage being division between two large numbers. I say preliminary, because these results are actually hard to believe. But, whatever the Array<Long> penalty is, it doesn't look too bad for this use case.
The following is for one million operations each on pre-computed random numbers, with warmup before each testing round:
Millis Class Operation LHS Bits RHS Bits 584 Whole + 32 32 596 BigInteger + 32 32 452 Whole + 32 128 849 BigInteger + 32 128 709 Whole - 32 32 710 BigInteger - 32 32 638 Whole - 32 128 778 BigInteger - 32 128 440 Whole * 32 32 692 BigInteger * 32 32 532 Whole * 32 128 837 BigInteger * 32 128 583 Whole * 128 32 782 BigInteger * 128 32 926 Whole * 128 128 974 BigInteger * 128 128 705 Whole / 32 32 654 BigInteger / 32 32 222 Whole / 32 128 731 BigInteger / 32 128 850 Whole / 128 32 866 BigInteger / 128 32 1442 Whole / 128 128 971 BigInteger / 128 128 — Reply to this email directly or view it on GitHub.
Sounds to good to be true
Well, it was fun while it lasted. The error turned out to be the use of wholes.map(...) instead of wholes.collect(...) to coerce the test data to BigIntegers. Not a minor difference.
Looking at the fundamental performance issue, this:
Integer arrayBenchCeylon() {
value array = arrayOfSize(64, 1);
variable value sum = 0;
for (i in 0:10000000) {
for (j in 0:array.size) {
assert(exists x = array[j]);
sum += x;
}
}
return sum;
}
runs 25 times slower than the equivalent in Java using a long[64]. And, ArrayList<Long> only takes twice as long as the primitive array.
public static final long arrayBenchJavaArrayList() {
ArrayList<Long> array = new ArrayList<Long>(64);
for (int i = 0; i < 64; i++) {
array.add(1L);
}
long sum = 0;
for (int i = 0; i < 10000000; i++) {
for (int j = 0; j < array.size(); j++) {
sum += array.get(j);
}
}
return sum;
}
So I guess there are significant optimizations for boxing/unboxing java.lang.* types that aren't available to other classes. Unless... I'm missing something here, too!
In practice, though, the penalty is quite a bit less than 25x, even for fairly large values:
| Millis | Class | Operation | LHS Bits | RHS Bits |
|---|---|---|---|---|
| 74 | Whole | + | 256 | 256 |
| 12 | BigInteger | + | 256 | 256 |
| 80 | Whole | - | 256 | 256 |
| 13 | BigInteger | - | 256 | 256 |
| 207 | Whole | * | 256 | 256 |
| 18 | BigInteger | * | 256 | 256 |
| 84 | Whole | / | 256 | 256 |
| 15 | BigInteger | / | 256 | 256 |
| 151 | Whole | + | 1024 | 1024 |
| 18 | BigInteger | + | 1024 | 1024 |
| 168 | Whole | - | 1024 | 1024 |
| 16 | BigInteger | - | 1024 | 1024 |
| 2298 | Whole | * | 1024 | 1024 |
| 128 | BigInteger | * | 1024 | 1024 |
| 175 | Whole | / | 1024 | 1024 |
| 22 | BigInteger | / | 1024 | 1024 |
You're using ArrayList, not Array?
No, I just took a quick look at ArrayList for a performance comparison.
On Dec 8, 2014, at 6:07 PM, Gavin King <[email protected]mailto:[email protected]> wrote:
You're using ArrayList, not Array?
Reply to this email directly or view it on GitHubhttps://github.com/ceylon/ceylon-sdk/issues/306#issuecomment-66205529.
I wonder if we used something like this as its internal state:
Integer i0;
Integer i1;
Integer[] is;
Integer get(Integer loc) => switch (loc) case (0) i0 case (1) i1 else is[loc];
Just an idea.
That looks like a great idea. But then, why not a tree of these containers? And that works too. So, why not an array of them :). But that doesn't work.
A lot of the Array slowdown looks like it has nothing to do with Integers or type conversions, and really doesn't make sense by looking at the source. Other Ceylon types seem affected too. Is there something else in the mix? It almost seems like there is a proxy, but the stack traces look clean. java.util.ArrayList performs about 12x better, as do any attempts of mine to duplicate c.l.Array using either Java or Ceylon.
Here's an example using Integers:
import java.util {
ArrayList
}
import ceylon.collection {
LinkedList
}
class IntegerHolder() {
shared variable Integer i1 = 0;
shared variable Integer i2 = 0;
}
shared void runIHTest() {
bench(calculate);
//bench(containerBench);
}
void calculate() {
value array = arrayOfSize(1, IntegerHolder()); // 4555
//value array = ArrayList<IntegerHolder>(); array.add(IntegerHolder()); // 320
for (i in 0:10^9) {
assert(exists ih = array.get(0));
ih.i1 = ih.i1 + 1;
}
print ((array.get(0) else nothing).i1);
}
void bench<T>(T f()) {
for (i in 0:3) { f(); }
Integer start = system.nanoseconds;
f();
print ("``(system.nanoseconds - start)/10^6``");
}
And this with just Objects:
class Container<T>(T content) {
shared variable Integer counter = 0;
shared T? get(Integer i) {
counter++;
return content;
}
}
void containerBench() {
value container = Array<Object> { object {} }; // 3753
//value container = LinkedList<Object> { object {} }; // 718
//value container = ArrayList<Object>(); container.add( object {} ); // 325
//value container = Container<Object>(object {}); // 314
for (i in 0:10^9) {
value _ = container.get(0);
}
//print(container.counter);
}
@jvasileff Be careful: retrieving an element from an Array<Integer> boxes the primitive int.
java.util.ArrayList performs about 12x better
Note that the VM itself has hacked-in optimizations for java.util.ArrayList. We can't really reproduce that with our own classes.
@jvasileff Oh wait, you're using List.get() not List.getFromFirst(). In microbenchmarks getFromFirst() is much faster because it avoids boxing the index!
But yeah, even after making that change, I get much better performance in this microbenchmark with LinkedList than with Array or ArrayList.
OK, @jvasileff, I just committed some performance optimizations to Array that now make our ArrayList very competitive with Java's. I can't measure a significant difference in performance for getFromFirst()s and set()s. The get() method is still much slower due to boxing of its index, so I think we should change the compiler to use getFromFirst() underneath the square-bracket syntax.
Note also, however, that getFromFirst() on an Array<Integer> is still much slower than getFromFirst() on an Array<Object> or ArrayList<Integer>, because Array optimizes for storage efficiency.
getFromFirst() is much faster because it avoids boxing the index!
Ah, I can't believe I missed that one! Thanks for looking a this, I look forward to trying out the changes.
So it looks like if there is an answer, it will be a cons list. For 500k multiplications of 2048 bit numbers:
| ms | Implementation |
|---|---|
| 1803 | BigInteger |
| 2515 | Whole with LongArray, using int instead of long for loop variables/array indexes |
| 3114 | Whole with LongArray, using (int) typecast instead of toInt(...) |
| 3539 | Whole with LongArray |
| 3841 | Whole with cons list |
| 14957 | Whole with JArrayList<Integer> |
| 15200 | Whole with Array<Object> |
| 15461 | Whole with Array<Integer> |
Edit - added extra LongArray timings, using Java code slightly altered from ceylon compile ... --verbose-code output.
So it looks like if there is an answer, it will be a cons list.
Well, that sounds good to me!
For 500k multiplications of 2048 bit numbers:
What are the equivalent measurements for 32 and 64 bit numbers, since they're going to be much more common. (Just a comparison of BigInteger with cons list is enough.)
Great work!
For 5 million 64bit * 32bit calculations:
| ms | implementation |
|---|---|
| 397 | BigInteger |
| 594 | Whole with LongArray |
| 773 | Whole with Array<Object> |
| 801 | Whole with cons list* |
The cons list is punished here since the internal representation for that case is Array<Object> with the translations to/from cons list happening inside the multiply method. This was just noise in the 2048bit case. And, none of these are calling an optimized manyWords * singleWord method, so they might be running slightly slower than they should.
Also - note the additions to my previous numbers. These directly address the long vs int indexing issue. I'm not sure why BigInteger is still faster in all cases. It must be some algorithmic difference, or maybe their use of int[] instead of long[] for storage.
The cons list is punished here since the internal representation for that case is
Array<Object>with the translations to/from cons list happening inside the multiply method.
I don't understand this comment.
I'm not sure why
BigIntegeris still faster in all cases.
Note that it's quite possible that the JVM itself contains some optimizations for BigInteger arithmetic.
I don't understand this comment.
The timings are for Whole * Whole, and within Whole, the bits are still stored in a property Array<Object> words;. So the first thing that Array<Object> multiply(x, y) { ... } has to do is build a a couple lists from the two arrays. And the last thing it does is create an array from the results list. This is pretty wasteful for 64bits * 32bits.
Wait, what are you varying then? I thought this was for varying the internal representation used by Whole.
Well, yes. For everything but the list, it is a swap out of a type alias and a few minimal helper functions. But for the list case, the actual multiplication code has to change since the iteration happens by accessing tail, not by index. And nothing but the multiplication method knows anything about lists right now.
Take a look at the list multiply method if you like: https://github.com/jvasileff/ceylon-sdk/blob/ceylon-whole-cons/source/ceylon/math/whole/Whole.ceylon#L407
And how the internal representation is usually changed: https://github.com/jvasileff/ceylon-sdk/blob/ceylon-whole-cons/source/ceylon/math/whole/Words.ceylon
A doubly linked list?
A doubly linked list?
Singly, used procedurally. You tell me if its an abuse of terminology!
class WordCons(word, tail) {
shared variable Integer word;
shared WordCons? tail;
}
I'm suggesting you try a doubly linked list, to avoid the cost of repackaging it.