ceylon-sdk icon indicating copy to clipboard operation
ceylon-sdk copied to clipboard

optimize LinkedList.repeat(), LinkedList.patch(), LinkedList.rest

Open jvasileff opened this issue 9 years ago • 25 comments

List refines repeat() with the assumption that the repeated List is random-access. The helper class Repeat satisfies List<Element>, and List's default members assume getFromFirst() is fast.

    void time(Anything() f) {
        value start = system.milliseconds;
        f();
        print("``system.milliseconds - start``ms");
    }

    value list1 = LinkedList { *(1:10k) }.repeat(1);
    value list2 = LinkedList { *list1 };

    time(() => list1.findLast(100.equals)); // 123ms
    time(() => list2.findLast(100.equals)); // 1ms

    time(() => list1.findLast(100.equals)); // 119ms
    time(() => list2.findLast(100.equals)); // 1ms

similar for patch():

    value list1 = LinkedList { *(1:10k) }.patch([1]);
    value list2 = LinkedList { *list1 };

    time(() => list1.findLast(100.equals)); // 124ms
    time(() => list2.findLast(100.equals)); // 1ms

and rest:

    value list1 = LinkedList { *(1:10k) }.rest;
    value list2 = LinkedList { *list1 };

    time(() => list1.findLast(100.equals)); // 106ms
    time(() => list2.findLast(100.equals)); // 1ms

jvasileff avatar Apr 27 '16 17:04 jvasileff

and sublist(), sublistFrom(), sublistTo():

    value list1 = LinkedList { *(1:10k) }.sublist(0,10k); 
    value list2 = LinkedList { *list1 };

    time(() => list1.findLast(100.equals)); // 128ms
    time(() => list2.findLast(100.equals)); // 1ms

jvasileff avatar Apr 27 '16 20:04 jvasileff

It would be nice for List<Element> to have refinements that look more like Iterable's members, and a RandomAccessList<Element> sub-interface. But I know, that would be a huge change.

jvasileff avatar Apr 27 '16 20:04 jvasileff

indexesWhere() blows up even without repeat/patch/etc:

    value list1 = LinkedList { *(1:10k) };
    value list2 = list1.sequence();

    time(() => list1.indexesWhere((c) => true).size); // 138ms
    time(() => list2.indexesWhere((c) => true).size); // 3ms

jvasileff avatar Jul 08 '16 01:07 jvasileff

It would be nice for List<Element> to have refinements that look more like Iterable's members, and a RandomAccessList<Element> sub-interface. But I know, that would be a huge change.

Instead of that, here's my proposal:

  • Strengthen the List contract to require O(1) performance for element access
  • No longer have LinkedList satisfy List. Reevaluate what methods LinkedList should have.
  • Introduce an ArrayQueue that satisfies List and can be used when efficient adding or removing from both the front and back of the list is needed. ArrayQueue would be Array based and grow on demand.

ArrayQueue would be far superior to LinkedList for nearly all use cases (adding/removing to/from the middle of the list would be a possible exception, except the current LinkedList doesn't even support that well, since it's inefficient to locate the middle of the list).

This would be a significant compatibility change to LinkedList, but I think it's warranted because

  1. the current implementation has so many performance bugs that are hard to fix
  2. existing code that widens LinkedList to List may have performance bugs of its own
  3. LinkedList isn't generally useful given a good Array based alternative
  4. having List provide performance guarantees is very valuable; you'd never have to make a copy "just in case" when your algorithm requires array-like performance. Further, Lists default methods would now be justified by it's contract.

Now, it's quite possible that instead of having a new ArrayQueue, we can instead enhance the current ArrayList to have efficient add/remove operations on the front of the list.

WDYT @gavinking?

jvasileff avatar Sep 24 '16 15:09 jvasileff

IMO not having LinkedList implement List would go completely against the principle of least surprise. What's wrong with the RandomAccessList idea you mentioned before?

quintesse avatar Sep 24 '16 15:09 quintesse

@quintesse I don't think that's true, because the bigger surprise is LinkedList's performance characteristics, since array-like performance is almost always assumed by programers when working with Lists. The "oh, why isn't this a List?" surprise isn't dangerous, since you find out at compile time.

I moved away from the RandomAccessList idea because:

  1. It isn't as effective at solving the surprising performance of some Lists (once you learn the api, it would certainly help, but also, do you want parameters of type RandomAccessList<String>?)
  2. It's a much bigger and more fundamental api change
  3. I've since discovered that LinkedLists are a bit of an anti-pattern anyway (and especially so for our implementation). See https://plus.google.com/+JustinFagnani/posts/8qWcQyTP1Wp and https://www.reddit.com/r/programming/comments/25xpre/bjarne_stroustrup_why_you_should_avoid_linked/

jvasileff avatar Sep 24 '16 16:09 jvasileff

since array-like performance is almost always assumed by programers when working with Lists

Well, this is where I disagree, I've never seen people assume array-like performance characteristics for a List. If they do I'm not sure they should call themselves programmers ;)

quintesse avatar Sep 24 '16 16:09 quintesse

@quintesse that is wrong. The exact bugs this issue was created for were caused by programmers assuming array-like performance for lists. So we don't even have to scour github for examples.

But even for programers that wouldn't make that assumption, not having LinkedList satisfy List is not surprising in a dangerous way afaict.

jvasileff avatar Sep 24 '16 16:09 jvasileff

@quintesse that is wrong

No it isn't. I have never seen it ;)

quintesse avatar Sep 24 '16 16:09 quintesse

I think most people pick their data structures knowing what sort of perf they get: fast insertion you get a linked list, compact memory you get an array list. Same with sets, sorted sets. It's expected that different lists have different memory/computation efficiencies. It's a lot less expected IMO that certains lists are not List.

FroMage avatar Sep 26 '16 07:09 FroMage

I agree with @FroMage.

gavinking avatar Sep 26 '16 08:09 gavinking

I think most people pick their data structures knowing what sort of perf they get

That's exactly my point: people do care about performance characteristics.

If you're only going to directly use a collection you create, the supertypes don't matter, so that scenario isn't relevant. But performance also matters for general purpose functions:

Anything processData(List<> list);

You wouldn't want to have to try to enumerate all of the array-like lists you can think of:

Anything processData([String*] | Array<String> | ArrayList<String> | CustomList<String>);

and that's were both supertypes and performance matters. (A whole bunch of these functions exist in List today!)

It's expected that different lists have different memory/computation efficiencies

There's an opportunity here, because almost all Lists perform like arrays for lookups. Excluding one or two oddballs from the club makes List much more useful and easier to implement.

It's a lot less expected IMO that certains lists are not List.

Especially so for Java devs I'm sure. But no one has said why this would be a problem, and in fact surprises would often be healthy, challenging the decision to use LinkedList in the first place, especially if we offered a superior ArrayQueue.

jvasileff avatar Sep 26 '16 12:09 jvasileff

Hopefully I’ve changed some minds above.

If not, how do we fix the performance problems for LinkedList and others like it (NativeLinkedList, LinkedListMultimap, etc)?

It’s a lot of work to override all of Lists overrides, and it’s also very easy to make mistakes or omissions (hence, the bugs), which makes this look like a design problem.

jvasileff avatar Sep 26 '16 12:09 jvasileff

You haven't changed mine, I can tell you that. Going against many years of "tradition", if you like, is definitely a no-no in my book and a good way to surprise and possibly alienate users.

If having a sub-interface or marker interface is a more work, so be it. That it is possible to write code using List that performs bad, sure, we'll just have to educate people better by explaining the possible problems (and their solutions) in the docs.

To me having a processData(List) that doesn't take LinkedList would be extremely weird. If its developer is very sure that using the wrong List would cause grave performance issues than they could go for processData(IndexableList), but even then I can image as a user of that API that I would be miffed that I couldn't pass a LinkedList without a good reason: "It's my responsibility, I will decide what acceptable performance is for me". Sure, you can just pass ArrayList(myLinkedList), but that's just exchanging one problem (performance) for another (memory).

And if you are able to detect the difference between the two classes of Lists (the ones that work well indexed and the ones that should just be iterated over) you can choose between different implementations internally if you want. That way at least any performance issues people find can be fixed transparently.

quintesse avatar Sep 26 '16 14:09 quintesse

To me having a processData(List) that doesn't take LinkedList would be extremely weird

Why? It would take an Iterable if it doesn't need a fast getFromFirst, and of course it would be a big mistake to use a LinkedList in this function if it does rely on getFromFirst.

And, why do you want to use LinkedList anyway? The conventional wisdom that LinkedLists are useful seems to be wrong.

The ability to quickly lookup by index is far and away the most important performance characteristic for lists, so it follows that there should be a type that can be used when this type of performance is required for a parameter.

But, anyway, there are lots of posts above saying we should just stick with what Java does. So fine. How do we resolve this issue? Have an AbstractSequentialList like Java does?

jvasileff avatar Sep 26 '16 14:09 jvasileff

so it follows that there should be a type

Personally I'm not really sure type safety is something that should to be used to try to guarantee something about runtime performance.

The fact is that although using a large linked list as an argument to some of these functions might be very inefficient, it is not wrong (those algorithms will terminate eventually). If you try to use type safety to protect people from slow algorithms many people with short linked lists won't thank you.

And where does this end?

For reference Java has RandomAccess, which I've never seen used in any API (I assume it's only used internally when choosing between different possible algorithms for, e.g. sorting).

tombentley avatar Sep 26 '16 15:09 tombentley

The ability to quickly lookup by index is far and away the most important performance characteristic for lists,

I just don't agree with that, that's what arrays are for, or specific List implementations. To me lists are only ever guaranteed to be accessible with acceptable performance by using Iterables. The Java docs for example specifically state:

The List interface provides four methods for positional (indexed) access to list elements. Lists (like Java arrays) are zero based. Note that these operations may execute in time proportional to the index value for some implementations (the LinkedList class, for example). Thus, iterating over the elements in a list is typically preferable to indexing through it if the caller does not know the implementation.

How do we resolve this issue?

Well, what's wrong with the marker interface solution? Like RandomAccess as @tombentley mentions. It's of course the responsibility of the implementers to actually make use of it, but it seems a perfectly reasonable solution.

quintesse avatar Sep 26 '16 15:09 quintesse

And where does this end?

It ends with this one change, being able to describe lists with array-like performance for reads.

But anyway, this was just one proposed solution to the problem of List assuming array-like performance in its implementation. And it seems quite unpopular. So, if this idea is no good, how do we resolve this issue?

I really don't like the idea that when you satisfy an interface, you should have to inspect all inherited default implementations to see if they apply to your subtype. As we've seen, this is error prone, and really, borders on violating the substitution principle.

jvasileff avatar Sep 26 '16 15:09 jvasileff

It's of course the responsibility of the implementers to actually make use of it

well, if added for the purpose of resolving this issue, that wouldn't be a problem since List would lose all of its default implementations that don't work well for linked lists.

jvasileff avatar Sep 26 '16 15:09 jvasileff

since List would lose all of it's default implementations that don't work well for linked lists.

Unless the default implementations implement this behaviour of choosing between different implementations already, right?

quintesse avatar Sep 26 '16 15:09 quintesse

A list is fundamentally defined by its non-commutativity and non-idempotence properties, by opposition with bag, set and ordered-set. The fact that it's backed with array or node should be transparent.

If I need random access for performance reason, I will use an array or a specific list implementation that exposes random access methods.

jean-morissette avatar Sep 26 '16 16:09 jean-morissette

I was defending my proposal. But I get it.

  • People want LinkedList to satisfy List, and
  • It's seen as "perfectly ok" that if you have void processData(List) that requires requires the list be random-access, api documentation stating that requirement will suffice.

Like I said, this was just one proposal to fix the primary issue, which is that LinkedList has performance bugs, and that List has default members that make incorrect assumptions about subtypes. (Please note the distinction from Iterable's default members, which are unoptimized for some subclasses, but not actually wrong.)

We can certainly fix LinkedList by just overriding more of Lists default members (perhaps copy and past from Iterable). But I think it would be nice to do better for the reason I stated above.

jvasileff avatar Sep 26 '16 17:09 jvasileff

perhaps copy and past from Iterable

Shouldn't it be possible to just call super when RandomAccess is not implemented?

quintesse avatar Sep 26 '16 17:09 quintesse

Would that not be the same as moving the array-optimized default members to RandomAccess, and leaving behind =>super stubs in List (for binary compatibility)?

But either way, we'd be de-optimizing existing array based Lists, which I'm not sure is ok (ceylon.collection::ArrayList/1.3.0 would be slower).

jvasileff avatar Sep 26 '16 17:09 jvasileff

But either way, we'd be de-optimizing existing array based Lists

I don't understand, how we'd be doing that?

quintesse avatar Sep 26 '16 18:09 quintesse