ceylon-sdk reimplement Decimal and Whole in Ceylon

The current implementations are plain ugly, and Java-only. So I think we need to do this. But there are some challenges. For example, the Java impl of BigInteger uses an int[] array, which we don't really have a high-performance platform-neutral abstraction of yet.

Sep 22 '14 21:09 gavinking

I will make a start on this by taking a stab at reimplementing Whole in Ceylon based on an Array<Long>, since that doesn't look terribly daunting.

Sep 22 '14 22:09 gavinking

Hi. Just noticed ceylon.math is for Java only, which seems like a "had-no-time-to-do-it" limitation? This is really necessary for any serious development to happen in JS! If you hadn't had the time to look into it, perhaps I can give this a go (I actually really need to use Whole right now)...

Oct 18 '14 20:10 renatoathaydes

Go for it!

Oct 18 '14 20:10 gavinking

I started on an implementation of Whole, and although it still needs a ton of work, most of the functions mostly work, so I thought I'd share what I have - https://github.com/jvasileff/ceylon-sdk/commit/095f8d24cfbcb9241ccc5417278b1a28a03659c7

The completely unimplemented pieces are dividing by one-word Wholes, a rare condition when dividing larger Wholes, string parsing, hash, and I'm sure a few others.

The word size is configurable (by editing a constant in the source), and will be 16 bits on JS and 32 bits on Java.

Dec 04 '14 04:12 jvasileff

Awesome!

Sent from my iPhone

On 04 Dec 2014, at 05:34, John Vasileff [email protected] wrote:

I started on an implementation of Whole, and although it still needs a ton of work, most of the functions mostly work, so I thought I'd share what I have - jvasileff@095f8d2

The completely unimplemented pieces are dividing by one-word Wholes, a rare condition when dividing larger Wholes, string parsing, hash, and I'm sure a few others.

The word size is configurable (by editing a constant in the source), and will be 16 bits on JS and 32 bits on Java.

— Reply to this email directly or view it on GitHub.

Dec 04 '14 11:12 gavinking

Is there a good reason to keep Whole (and Decimal) as interfaces, assuming private constructors are coming? Currently I'm using ugly casts in WholeImpl, and final classes would provide better immutability guarantees. I guess sealed interfaces would be another option.

Separately, once constructors are available, are there any opinions on using constructors vs. factory functions? parseWhole and wholeNumber would be easier to find as constructors to a Whole class. The same issue is currently open for Uris, where the nondescript parse(String) is used, along with parseParameter(String) for Parameters, and potentially parsePath(String) and others. (My point is not to debate Uris here, just using it as an example.)

Dec 04 '14 15:12 jvasileff

Is there a good reason to keep Whole (and Decimal) as interfaces

No, not AFAIK.

Separately, once constructors are available, are there any opinions on using constructors vs. factory functions?

I'm in general not a fan of factory functions, except for the case of parseXxxx() methods which accept String. I'm happy to stick with the pattern that functions are used for parsing strings.

Dec 04 '14 15:12 gavinking

Is there a good reason to keep Whole (and Decimal) as interfaces, assuming private constructors are coming? Currently I'm using ugly casts in WholeImpl, and final classes would provide better immutability guarantees. I guess sealed interfaces would be another option.

Basically no. The only reason those are interfaces is because we needed to have a shared type, but we didn't want to expose a class parameter list with BigInteger/BigDecimal in it.

Dec 04 '14 15:12 tombentley

BigInteger uses an int[] array, which we don't really have a high-performance platform-neutral abstraction of yet.

I have some very interesting preliminary results. My initial no-tricks implementation is actually beating BigInteger in many cases, with BigInteger's only clear advantage being division between two large numbers. I say preliminary, because these results are actually hard to believe. But, whatever the Array<Long> penalty is, it doesn't look too bad for this use case.

The following is for one million operations each on pre-computed random numbers, with warmup before each testing round:

Edit - removed incorrect benchmark.

Dec 07 '14 23:12 jvasileff

Wow I did not expect that. Sounds to good to be true, but if it is true, that's excellent news!

Sent from my iPhone

On 08 Dec 2014, at 00:13, John Vasileff [email protected] wrote:

BigInteger uses an int[] array, which we don't really have a high-performance platform-neutral abstraction of yet.

I have some very interesting preliminary results. My initial no-tricks implementation is actually beating BigInteger in many cases, with BigInteger's only clear advantage being division between two large numbers. I say preliminary, because these results are actually hard to believe. But, whatever the Array<Long> penalty is, it doesn't look too bad for this use case.

The following is for one million operations each on pre-computed random numbers, with warmup before each testing round:

Millis Class Operation LHS Bits RHS Bits 584 Whole + 32 32 596 BigInteger + 32 32 452 Whole + 32 128 849 BigInteger + 32 128 709 Whole - 32 32 710 BigInteger - 32 32 638 Whole - 32 128 778 BigInteger - 32 128 440 Whole * 32 32 692 BigInteger * 32 32 532 Whole * 32 128 837 BigInteger * 32 128 583 Whole * 128 32 782 BigInteger * 128 32 926 Whole * 128 128 974 BigInteger * 128 128 705 Whole / 32 32 654 BigInteger / 32 32 222 Whole / 32 128 731 BigInteger / 32 128 850 Whole / 128 32 866 BigInteger / 128 32 1442 Whole / 128 128 971 BigInteger / 128 128 — Reply to this email directly or view it on GitHub.

Dec 08 '14 11:12 gavinking

Sounds to good to be true

Well, it was fun while it lasted. The error turned out to be the use of wholes.map(...) instead of wholes.collect(...) to coerce the test data to BigIntegers. Not a minor difference.

Looking at the fundamental performance issue, this:

Integer arrayBenchCeylon() {
    value array = arrayOfSize(64, 1);
    variable value sum = 0;
    for (i in 0:10000000) {
        for (j in 0:array.size) {
            assert(exists x = array[j]);
            sum += x;
        }
    }
    return sum;
}

runs 25 times slower than the equivalent in Java using a long[64]. And, ArrayList<Long> only takes twice as long as the primitive array.

    public static final long arrayBenchJavaArrayList() {
        ArrayList<Long> array = new ArrayList<Long>(64);
        for (int i = 0; i < 64; i++) {
            array.add(1L);
        }
        long sum = 0;
        for (int i = 0; i < 10000000; i++) {
            for (int j = 0; j < array.size(); j++) {
                sum += array.get(j);
            }
        }
        return sum;
    }

So I guess there are significant optimizations for boxing/unboxing java.lang.* types that aren't available to other classes. Unless... I'm missing something here, too!

In practice, though, the penalty is quite a bit less than 25x, even for fairly large values:

Millis	Class	Operation	LHS Bits	RHS Bits
74	Whole	+	256	256
12	BigInteger	+	256	256
80	Whole	-	256	256
13	BigInteger	-	256	256
207	Whole	*	256	256
18	BigInteger	*	256	256
84	Whole	/	256	256
15	BigInteger	/	256	256
151	Whole	+	1024	1024
18	BigInteger	+	1024	1024
168	Whole	-	1024	1024
16	BigInteger	-	1024	1024
2298	Whole	*	1024	1024
128	BigInteger	*	1024	1024
175	Whole	/	1024	1024
22	BigInteger	/	1024	1024

Dec 08 '14 21:12 jvasileff

You're using ArrayList, not Array?

Dec 08 '14 23:12 gavinking

No, I just took a quick look at ArrayList for a performance comparison.

On Dec 8, 2014, at 6:07 PM, Gavin King <[email protected]mailto:[email protected]> wrote:

You're using ArrayList, not Array?

Reply to this email directly or view it on GitHubhttps://github.com/ceylon/ceylon-sdk/issues/306#issuecomment-66205529.

Dec 08 '14 23:12 jvasileff

I wonder if we used something like this as its internal state:

Integer i0;
Integer i1;
Integer[] is;

Integer get(Integer loc) => switch (loc) case (0) i0 case (1) i1 else is[loc];

Just an idea.

Dec 08 '14 23:12 gavinking

That looks like a great idea. But then, why not a tree of these containers? And that works too. So, why not an array of them :). But that doesn't work.

A lot of the Array slowdown looks like it has nothing to do with Integers or type conversions, and really doesn't make sense by looking at the source. Other Ceylon types seem affected too. Is there something else in the mix? It almost seems like there is a proxy, but the stack traces look clean. java.util.ArrayList performs about 12x better, as do any attempts of mine to duplicate c.l.Array using either Java or Ceylon.

Here's an example using Integers:

import java.util {
    ArrayList
}
import ceylon.collection {
    LinkedList
}

class IntegerHolder() {
    shared variable Integer i1 = 0;
    shared variable Integer i2 = 0;
}

shared void runIHTest() {
    bench(calculate);
    //bench(containerBench);
}

void calculate() {
    value array = arrayOfSize(1, IntegerHolder()); // 4555
    //value array = ArrayList<IntegerHolder>(); array.add(IntegerHolder()); // 320 
    for (i in 0:10^9) {
        assert(exists ih = array.get(0));
        ih.i1 = ih.i1 + 1;
    }
    print ((array.get(0) else nothing).i1);
}

void bench<T>(T f()) {
    for (i in 0:3) { f(); }
    Integer start = system.nanoseconds;
    f();
    print ("``(system.nanoseconds - start)/10^6``");
}

And this with just Objects:

class Container<T>(T content) {
    shared variable Integer counter = 0;
    shared T? get(Integer i) {
        counter++;
        return content;
    }
}

void containerBench() {
    value container = Array<Object> { object {} }; // 3753
    //value container = LinkedList<Object> { object {} }; // 718 
    //value container = ArrayList<Object>(); container.add( object {} ); // 325
    //value container = Container<Object>(object {}); // 314

    for (i in 0:10^9) {
        value _ = container.get(0);
    }
    //print(container.counter);        
}

Dec 09 '14 03:12 jvasileff

@jvasileff Be careful: retrieving an element from an Array<Integer> boxes the primitive int.

java.util.ArrayList performs about 12x better

Note that the VM itself has hacked-in optimizations for java.util.ArrayList. We can't really reproduce that with our own classes.

Dec 10 '14 09:12 gavinking

@jvasileff Oh wait, you're using List.get() not List.getFromFirst(). In microbenchmarks getFromFirst() is much faster because it avoids boxing the index!

Dec 10 '14 09:12 gavinking

But yeah, even after making that change, I get much better performance in this microbenchmark with LinkedList than with Array or ArrayList.

Dec 10 '14 09:12 gavinking

OK, @jvasileff, I just committed some performance optimizations to Array that now make our ArrayList very competitive with Java's. I can't measure a significant difference in performance for getFromFirst()s and set()s. The get() method is still much slower due to boxing of its index, so I think we should change the compiler to use getFromFirst() underneath the square-bracket syntax.

Note also, however, that getFromFirst() on an Array<Integer> is still much slower than getFromFirst() on an Array<Object> or ArrayList<Integer>, because Array optimizes for storage efficiency.

Dec 10 '14 13:12 gavinking

getFromFirst() is much faster because it avoids boxing the index!

Ah, I can't believe I missed that one! Thanks for looking a this, I look forward to trying out the changes.

Dec 10 '14 15:12 jvasileff

So it looks like if there is an answer, it will be a cons list. For 500k multiplications of 2048 bit numbers:

ms	Implementation
1803	`BigInteger`
2515	`Whole` with `LongArray`, using `int` instead of `long` for loop variables/array indexes
3114	`Whole` with `LongArray`, using `(int)` typecast instead of `toInt(...)`
3539	`Whole` with `LongArray`
3841	`Whole` with cons list
14957	`Whole` with `JArrayList<Integer>`
15200	`Whole` with `Array<Object>`
15461	`Whole` with `Array<Integer>`

Edit - added extra LongArray timings, using Java code slightly altered from ceylon compile ... --verbose-code output.

Dec 12 '14 06:12 jvasileff

So it looks like if there is an answer, it will be a cons list.

Well, that sounds good to me!

For 500k multiplications of 2048 bit numbers:

What are the equivalent measurements for 32 and 64 bit numbers, since they're going to be much more common. (Just a comparison of BigInteger with cons list is enough.)

Great work!

Dec 12 '14 11:12 gavinking

For 5 million 64bit * 32bit calculations:

ms	implementation
397	`BigInteger`
594	`Whole` with `LongArray`
773	`Whole` with `Array<Object>`
801	`Whole` with cons list*

The cons list is punished here since the internal representation for that case is Array<Object> with the translations to/from cons list happening inside the multiply method. This was just noise in the 2048bit case. And, none of these are calling an optimized manyWords * singleWord method, so they might be running slightly slower than they should.

Also - note the additions to my previous numbers. These directly address the long vs int indexing issue. I'm not sure why BigInteger is still faster in all cases. It must be some algorithmic difference, or maybe their use of int[] instead of long[] for storage.

Dec 12 '14 17:12 jvasileff

The cons list is punished here since the internal representation for that case is Array<Object> with the translations to/from cons list happening inside the multiply method.

I don't understand this comment.

I'm not sure why BigInteger is still faster in all cases.

Note that it's quite possible that the JVM itself contains some optimizations for BigInteger arithmetic.

Dec 12 '14 17:12 gavinking

I don't understand this comment.

The timings are for Whole * Whole, and within Whole, the bits are still stored in a property Array<Object> words;. So the first thing that Array<Object> multiply(x, y) { ... } has to do is build a a couple lists from the two arrays. And the last thing it does is create an array from the results list. This is pretty wasteful for 64bits * 32bits.

Dec 12 '14 17:12 jvasileff

Wait, what are you varying then? I thought this was for varying the internal representation used by Whole.

Dec 12 '14 18:12 gavinking

Well, yes. For everything but the list, it is a swap out of a type alias and a few minimal helper functions. But for the list case, the actual multiplication code has to change since the iteration happens by accessing tail, not by index. And nothing but the multiplication method knows anything about lists right now.

Take a look at the list multiply method if you like: https://github.com/jvasileff/ceylon-sdk/blob/ceylon-whole-cons/source/ceylon/math/whole/Whole.ceylon#L407

And how the internal representation is usually changed: https://github.com/jvasileff/ceylon-sdk/blob/ceylon-whole-cons/source/ceylon/math/whole/Words.ceylon

Dec 12 '14 18:12 jvasileff

A doubly linked list?

Dec 12 '14 18:12 gavinking

A doubly linked list?

Singly, used procedurally. You tell me if its an abuse of terminology!

class WordCons(word, tail) {
    shared variable Integer word;
    shared WordCons? tail;
}

Dec 12 '14 18:12 jvasileff

I'm suggesting you try a doubly linked list, to avoid the cost of repackaging it.

Dec 12 '14 18:12 gavinking