ceylon-sdk icon indicating copy to clipboard operation
ceylon-sdk copied to clipboard

reimplement Decimal and Whole in Ceylon

Open gavinking opened this issue 11 years ago • 38 comments

The current implementations are plain ugly, and Java-only. So I think we need to do this. But there are some challenges. For example, the Java impl of BigInteger uses an int[] array, which we don't really have a high-performance platform-neutral abstraction of yet.

gavinking avatar Sep 22 '14 21:09 gavinking

I will make a start on this by taking a stab at reimplementing Whole in Ceylon based on an Array<Long>, since that doesn't look terribly daunting.

gavinking avatar Sep 22 '14 22:09 gavinking

Hi. Just noticed ceylon.math is for Java only, which seems like a "had-no-time-to-do-it" limitation? This is really necessary for any serious development to happen in JS! If you hadn't had the time to look into it, perhaps I can give this a go (I actually really need to use Whole right now)...

renatoathaydes avatar Oct 18 '14 20:10 renatoathaydes

Go for it!

gavinking avatar Oct 18 '14 20:10 gavinking

I started on an implementation of Whole, and although it still needs a ton of work, most of the functions mostly work, so I thought I'd share what I have - https://github.com/jvasileff/ceylon-sdk/commit/095f8d24cfbcb9241ccc5417278b1a28a03659c7

The completely unimplemented pieces are dividing by one-word Wholes, a rare condition when dividing larger Wholes, string parsing, hash, and I'm sure a few others.

The word size is configurable (by editing a constant in the source), and will be 16 bits on JS and 32 bits on Java.

jvasileff avatar Dec 04 '14 04:12 jvasileff

Awesome!

Sent from my iPhone

On 04 Dec 2014, at 05:34, John Vasileff [email protected] wrote:

I started on an implementation of Whole, and although it still needs a ton of work, most of the functions mostly work, so I thought I'd share what I have - jvasileff@095f8d2

The completely unimplemented pieces are dividing by one-word Wholes, a rare condition when dividing larger Wholes, string parsing, hash, and I'm sure a few others.

The word size is configurable (by editing a constant in the source), and will be 16 bits on JS and 32 bits on Java.

— Reply to this email directly or view it on GitHub.

gavinking avatar Dec 04 '14 11:12 gavinking

Is there a good reason to keep Whole (and Decimal) as interfaces, assuming private constructors are coming? Currently I'm using ugly casts in WholeImpl, and final classes would provide better immutability guarantees. I guess sealed interfaces would be another option.

Separately, once constructors are available, are there any opinions on using constructors vs. factory functions? parseWhole and wholeNumber would be easier to find as constructors to a Whole class. The same issue is currently open for Uris, where the nondescript parse(String) is used, along with parseParameter(String) for Parameters, and potentially parsePath(String) and others. (My point is not to debate Uris here, just using it as an example.)

jvasileff avatar Dec 04 '14 15:12 jvasileff

Is there a good reason to keep Whole (and Decimal) as interfaces

No, not AFAIK.

Separately, once constructors are available, are there any opinions on using constructors vs. factory functions?

I'm in general not a fan of factory functions, except for the case of parseXxxx() methods which accept String. I'm happy to stick with the pattern that functions are used for parsing strings.

gavinking avatar Dec 04 '14 15:12 gavinking

Is there a good reason to keep Whole (and Decimal) as interfaces, assuming private constructors are coming? Currently I'm using ugly casts in WholeImpl, and final classes would provide better immutability guarantees. I guess sealed interfaces would be another option.

Basically no. The only reason those are interfaces is because we needed to have a shared type, but we didn't want to expose a class parameter list with BigInteger/BigDecimal in it.

tombentley avatar Dec 04 '14 15:12 tombentley

BigInteger uses an int[] array, which we don't really have a high-performance platform-neutral abstraction of yet.

I have some very interesting preliminary results. My initial no-tricks implementation is actually beating BigInteger in many cases, with BigInteger's only clear advantage being division between two large numbers. I say preliminary, because these results are actually hard to believe. But, whatever the Array<Long> penalty is, it doesn't look too bad for this use case.

The following is for one million operations each on pre-computed random numbers, with warmup before each testing round:

Edit - removed incorrect benchmark.

jvasileff avatar Dec 07 '14 23:12 jvasileff

Wow I did not expect that. Sounds to good to be true, but if it is true, that's excellent news!

Sent from my iPhone

On 08 Dec 2014, at 00:13, John Vasileff [email protected] wrote:

BigInteger uses an int[] array, which we don't really have a high-performance platform-neutral abstraction of yet.

I have some very interesting preliminary results. My initial no-tricks implementation is actually beating BigInteger in many cases, with BigInteger's only clear advantage being division between two large numbers. I say preliminary, because these results are actually hard to believe. But, whatever the Array<Long> penalty is, it doesn't look too bad for this use case.

The following is for one million operations each on pre-computed random numbers, with warmup before each testing round:

Millis Class Operation LHS Bits RHS Bits 584 Whole + 32 32 596 BigInteger + 32 32 452 Whole + 32 128 849 BigInteger + 32 128 709 Whole - 32 32 710 BigInteger - 32 32 638 Whole - 32 128 778 BigInteger - 32 128 440 Whole * 32 32 692 BigInteger * 32 32 532 Whole * 32 128 837 BigInteger * 32 128 583 Whole * 128 32 782 BigInteger * 128 32 926 Whole * 128 128 974 BigInteger * 128 128 705 Whole / 32 32 654 BigInteger / 32 32 222 Whole / 32 128 731 BigInteger / 32 128 850 Whole / 128 32 866 BigInteger / 128 32 1442 Whole / 128 128 971 BigInteger / 128 128 — Reply to this email directly or view it on GitHub.

gavinking avatar Dec 08 '14 11:12 gavinking

Sounds to good to be true

Well, it was fun while it lasted. The error turned out to be the use of wholes.map(...) instead of wholes.collect(...) to coerce the test data to BigIntegers. Not a minor difference.

Looking at the fundamental performance issue, this:

Integer arrayBenchCeylon() {
    value array = arrayOfSize(64, 1);
    variable value sum = 0;
    for (i in 0:10000000) {
        for (j in 0:array.size) {
            assert(exists x = array[j]);
            sum += x;
        }
    }
    return sum;
}

runs 25 times slower than the equivalent in Java using a long[64]. And, ArrayList<Long> only takes twice as long as the primitive array.

    public static final long arrayBenchJavaArrayList() {
        ArrayList<Long> array = new ArrayList<Long>(64);
        for (int i = 0; i < 64; i++) {
            array.add(1L);
        }
        long sum = 0;
        for (int i = 0; i < 10000000; i++) {
            for (int j = 0; j < array.size(); j++) {
                sum += array.get(j);
            }
        }
        return sum;
    }

So I guess there are significant optimizations for boxing/unboxing java.lang.* types that aren't available to other classes. Unless... I'm missing something here, too!

In practice, though, the penalty is quite a bit less than 25x, even for fairly large values:

Millis Class Operation LHS Bits RHS Bits
74 Whole + 256 256
12 BigInteger + 256 256
80 Whole - 256 256
13 BigInteger - 256 256
207 Whole * 256 256
18 BigInteger * 256 256
84 Whole / 256 256
15 BigInteger / 256 256
151 Whole + 1024 1024
18 BigInteger + 1024 1024
168 Whole - 1024 1024
16 BigInteger - 1024 1024
2298 Whole * 1024 1024
128 BigInteger * 1024 1024
175 Whole / 1024 1024
22 BigInteger / 1024 1024

jvasileff avatar Dec 08 '14 21:12 jvasileff

You're using ArrayList, not Array?

gavinking avatar Dec 08 '14 23:12 gavinking

No, I just took a quick look at ArrayList for a performance comparison.

On Dec 8, 2014, at 6:07 PM, Gavin King <[email protected]mailto:[email protected]> wrote:

You're using ArrayList, not Array?

Reply to this email directly or view it on GitHubhttps://github.com/ceylon/ceylon-sdk/issues/306#issuecomment-66205529.

jvasileff avatar Dec 08 '14 23:12 jvasileff

I wonder if we used something like this as its internal state:

Integer i0;
Integer i1;
Integer[] is;

Integer get(Integer loc) => switch (loc) case (0) i0 case (1) i1 else is[loc];

Just an idea.

gavinking avatar Dec 08 '14 23:12 gavinking

That looks like a great idea. But then, why not a tree of these containers? And that works too. So, why not an array of them :). But that doesn't work.

A lot of the Array slowdown looks like it has nothing to do with Integers or type conversions, and really doesn't make sense by looking at the source. Other Ceylon types seem affected too. Is there something else in the mix? It almost seems like there is a proxy, but the stack traces look clean. java.util.ArrayList performs about 12x better, as do any attempts of mine to duplicate c.l.Array using either Java or Ceylon.

Here's an example using Integers:

import java.util {
    ArrayList
}
import ceylon.collection {
    LinkedList
}

class IntegerHolder() {
    shared variable Integer i1 = 0;
    shared variable Integer i2 = 0;
}

shared void runIHTest() {
    bench(calculate);
    //bench(containerBench);
}

void calculate() {
    value array = arrayOfSize(1, IntegerHolder()); // 4555
    //value array = ArrayList<IntegerHolder>(); array.add(IntegerHolder()); // 320 
    for (i in 0:10^9) {
        assert(exists ih = array.get(0));
        ih.i1 = ih.i1 + 1;
    }
    print ((array.get(0) else nothing).i1);
}

void bench<T>(T f()) {
    for (i in 0:3) { f(); }
    Integer start = system.nanoseconds;
    f();
    print ("``(system.nanoseconds - start)/10^6``");
}

And this with just Objects:

class Container<T>(T content) {
    shared variable Integer counter = 0;
    shared T? get(Integer i) {
        counter++;
        return content;
    }
}

void containerBench() {
    value container = Array<Object> { object {} }; // 3753
    //value container = LinkedList<Object> { object {} }; // 718 
    //value container = ArrayList<Object>(); container.add( object {} ); // 325
    //value container = Container<Object>(object {}); // 314

    for (i in 0:10^9) {
        value _ = container.get(0);
    }
    //print(container.counter);        
}

jvasileff avatar Dec 09 '14 03:12 jvasileff

@jvasileff Be careful: retrieving an element from an Array<Integer> boxes the primitive int.

java.util.ArrayList performs about 12x better

Note that the VM itself has hacked-in optimizations for java.util.ArrayList. We can't really reproduce that with our own classes.

gavinking avatar Dec 10 '14 09:12 gavinking

@jvasileff Oh wait, you're using List.get() not List.getFromFirst(). In microbenchmarks getFromFirst() is much faster because it avoids boxing the index!

gavinking avatar Dec 10 '14 09:12 gavinking

But yeah, even after making that change, I get much better performance in this microbenchmark with LinkedList than with Array or ArrayList.

gavinking avatar Dec 10 '14 09:12 gavinking

OK, @jvasileff, I just committed some performance optimizations to Array that now make our ArrayList very competitive with Java's. I can't measure a significant difference in performance for getFromFirst()s and set()s. The get() method is still much slower due to boxing of its index, so I think we should change the compiler to use getFromFirst() underneath the square-bracket syntax.

Note also, however, that getFromFirst() on an Array<Integer> is still much slower than getFromFirst() on an Array<Object> or ArrayList<Integer>, because Array optimizes for storage efficiency.

gavinking avatar Dec 10 '14 13:12 gavinking

getFromFirst() is much faster because it avoids boxing the index!

Ah, I can't believe I missed that one! Thanks for looking a this, I look forward to trying out the changes.

jvasileff avatar Dec 10 '14 15:12 jvasileff

So it looks like if there is an answer, it will be a cons list. For 500k multiplications of 2048 bit numbers:

ms Implementation
1803 BigInteger
2515 Whole with LongArray, using int instead of long for loop variables/array indexes
3114 Whole with LongArray, using (int) typecast instead of toInt(...)
3539 Whole with LongArray
3841 Whole with cons list
14957 Whole with JArrayList<Integer>
15200 Whole with Array<Object>
15461 Whole with Array<Integer>

Edit - added extra LongArray timings, using Java code slightly altered from ceylon compile ... --verbose-code output.

jvasileff avatar Dec 12 '14 06:12 jvasileff

So it looks like if there is an answer, it will be a cons list.

Well, that sounds good to me!

For 500k multiplications of 2048 bit numbers:

What are the equivalent measurements for 32 and 64 bit numbers, since they're going to be much more common. (Just a comparison of BigInteger with cons list is enough.)

Great work!

gavinking avatar Dec 12 '14 11:12 gavinking

For 5 million 64bit * 32bit calculations:

ms implementation
397 BigInteger
594 Whole with LongArray
773 Whole with Array<Object>
801 Whole with cons list*

The cons list is punished here since the internal representation for that case is Array<Object> with the translations to/from cons list happening inside the multiply method. This was just noise in the 2048bit case. And, none of these are calling an optimized manyWords * singleWord method, so they might be running slightly slower than they should.

Also - note the additions to my previous numbers. These directly address the long vs int indexing issue. I'm not sure why BigInteger is still faster in all cases. It must be some algorithmic difference, or maybe their use of int[] instead of long[] for storage.

jvasileff avatar Dec 12 '14 17:12 jvasileff

The cons list is punished here since the internal representation for that case is Array<Object> with the translations to/from cons list happening inside the multiply method.

I don't understand this comment.

I'm not sure why BigInteger is still faster in all cases.

Note that it's quite possible that the JVM itself contains some optimizations for BigInteger arithmetic.

gavinking avatar Dec 12 '14 17:12 gavinking

I don't understand this comment.

The timings are for Whole * Whole, and within Whole, the bits are still stored in a property Array<Object> words;. So the first thing that Array<Object> multiply(x, y) { ... } has to do is build a a couple lists from the two arrays. And the last thing it does is create an array from the results list. This is pretty wasteful for 64bits * 32bits.

jvasileff avatar Dec 12 '14 17:12 jvasileff

Wait, what are you varying then? I thought this was for varying the internal representation used by Whole.

gavinking avatar Dec 12 '14 18:12 gavinking

Well, yes. For everything but the list, it is a swap out of a type alias and a few minimal helper functions. But for the list case, the actual multiplication code has to change since the iteration happens by accessing tail, not by index. And nothing but the multiplication method knows anything about lists right now.

Take a look at the list multiply method if you like: https://github.com/jvasileff/ceylon-sdk/blob/ceylon-whole-cons/source/ceylon/math/whole/Whole.ceylon#L407

And how the internal representation is usually changed: https://github.com/jvasileff/ceylon-sdk/blob/ceylon-whole-cons/source/ceylon/math/whole/Words.ceylon

jvasileff avatar Dec 12 '14 18:12 jvasileff

A doubly linked list?

gavinking avatar Dec 12 '14 18:12 gavinking

A doubly linked list?

Singly, used procedurally. You tell me if its an abuse of terminology!

class WordCons(word, tail) {
    shared variable Integer word;
    shared WordCons? tail;
}

jvasileff avatar Dec 12 '14 18:12 jvasileff

I'm suggesting you try a doubly linked list, to avoid the cost of repackaging it.

gavinking avatar Dec 12 '14 18:12 gavinking