optimize LinkedList.repeat(), LinkedList.patch(), LinkedList.rest
List refines repeat() with the assumption that the repeated List is random-access. The helper class Repeat satisfies List<Element>, and List's default members assume getFromFirst() is fast.
void time(Anything() f) {
value start = system.milliseconds;
f();
print("``system.milliseconds - start``ms");
}
value list1 = LinkedList { *(1:10k) }.repeat(1);
value list2 = LinkedList { *list1 };
time(() => list1.findLast(100.equals)); // 123ms
time(() => list2.findLast(100.equals)); // 1ms
time(() => list1.findLast(100.equals)); // 119ms
time(() => list2.findLast(100.equals)); // 1ms
similar for patch():
value list1 = LinkedList { *(1:10k) }.patch([1]);
value list2 = LinkedList { *list1 };
time(() => list1.findLast(100.equals)); // 124ms
time(() => list2.findLast(100.equals)); // 1ms
and rest:
value list1 = LinkedList { *(1:10k) }.rest;
value list2 = LinkedList { *list1 };
time(() => list1.findLast(100.equals)); // 106ms
time(() => list2.findLast(100.equals)); // 1ms
and sublist(), sublistFrom(), sublistTo():
value list1 = LinkedList { *(1:10k) }.sublist(0,10k);
value list2 = LinkedList { *list1 };
time(() => list1.findLast(100.equals)); // 128ms
time(() => list2.findLast(100.equals)); // 1ms
It would be nice for List<Element> to have refinements that look more like Iterable's members, and a RandomAccessList<Element> sub-interface. But I know, that would be a huge change.
indexesWhere() blows up even without repeat/patch/etc:
value list1 = LinkedList { *(1:10k) };
value list2 = list1.sequence();
time(() => list1.indexesWhere((c) => true).size); // 138ms
time(() => list2.indexesWhere((c) => true).size); // 3ms
It would be nice for List<Element> to have refinements that look more like Iterable's members, and a RandomAccessList<Element> sub-interface. But I know, that would be a huge change.
Instead of that, here's my proposal:
- Strengthen the
Listcontract to require O(1) performance for element access - No longer have
LinkedListsatisfyList. Reevaluate what methodsLinkedListshould have. - Introduce an
ArrayQueuethat satisfiesListand can be used when efficient adding or removing from both the front and back of the list is needed.ArrayQueuewould beArraybased and grow on demand.
ArrayQueue would be far superior to LinkedList for nearly all use cases (adding/removing to/from the middle of the list would be a possible exception, except the current LinkedList doesn't even support that well, since it's inefficient to locate the middle of the list).
This would be a significant compatibility change to LinkedList, but I think it's warranted because
- the current implementation has so many performance bugs that are hard to fix
- existing code that widens
LinkedListtoListmay have performance bugs of its own -
LinkedListisn't generally useful given a goodArraybased alternative - having
Listprovide performance guarantees is very valuable; you'd never have to make a copy "just in case" when your algorithm requires array-like performance. Further,Lists default methods would now be justified by it's contract.
Now, it's quite possible that instead of having a new ArrayQueue, we can instead enhance the current ArrayList to have efficient add/remove operations on the front of the list.
WDYT @gavinking?
IMO not having LinkedList implement List would go completely against the principle of least surprise. What's wrong with the RandomAccessList idea you mentioned before?
@quintesse I don't think that's true, because the bigger surprise is LinkedList's performance characteristics, since array-like performance is almost always assumed by programers when working with Lists. The "oh, why isn't this a List?" surprise isn't dangerous, since you find out at compile time.
I moved away from the RandomAccessList idea because:
- It isn't as effective at solving the surprising performance of some
Lists (once you learn the api, it would certainly help, but also, do you want parameters of typeRandomAccessList<String>?) - It's a much bigger and more fundamental api change
- I've since discovered that
LinkedLists are a bit of an anti-pattern anyway (and especially so for our implementation). See https://plus.google.com/+JustinFagnani/posts/8qWcQyTP1Wp and https://www.reddit.com/r/programming/comments/25xpre/bjarne_stroustrup_why_you_should_avoid_linked/
since array-like performance is almost always assumed by programers when working with Lists
Well, this is where I disagree, I've never seen people assume array-like performance characteristics for a List. If they do I'm not sure they should call themselves programmers ;)
@quintesse that is wrong. The exact bugs this issue was created for were caused by programmers assuming array-like performance for lists. So we don't even have to scour github for examples.
But even for programers that wouldn't make that assumption, not having LinkedList satisfy List is not surprising in a dangerous way afaict.
@quintesse that is wrong
No it isn't. I have never seen it ;)
I think most people pick their data structures knowing what sort of perf they get: fast insertion you get a linked list, compact memory you get an array list. Same with sets, sorted sets. It's expected that different lists have different memory/computation efficiencies. It's a lot less expected IMO that certains lists are not List.
I agree with @FroMage.
I think most people pick their data structures knowing what sort of perf they get
That's exactly my point: people do care about performance characteristics.
If you're only going to directly use a collection you create, the supertypes don't matter, so that scenario isn't relevant. But performance also matters for general purpose functions:
Anything processData(List<> list);
You wouldn't want to have to try to enumerate all of the array-like lists you can think of:
Anything processData([String*] | Array<String> | ArrayList<String> | CustomList<String>);
and that's were both supertypes and performance matters. (A whole bunch of these functions exist in List today!)
It's expected that different lists have different memory/computation efficiencies
There's an opportunity here, because almost all Lists perform like arrays for lookups. Excluding one or two oddballs from the club makes List much more useful and easier to implement.
It's a lot less expected IMO that certains lists are not List.
Especially so for Java devs I'm sure. But no one has said why this would be a problem, and in fact surprises would often be healthy, challenging the decision to use LinkedList in the first place, especially if we offered a superior ArrayQueue.
Hopefully I’ve changed some minds above.
If not, how do we fix the performance problems for LinkedList and others like it (NativeLinkedList, LinkedListMultimap, etc)?
It’s a lot of work to override all of Lists overrides, and it’s also very easy to make mistakes or omissions (hence, the bugs), which makes this look like a design problem.
You haven't changed mine, I can tell you that. Going against many years of "tradition", if you like, is definitely a no-no in my book and a good way to surprise and possibly alienate users.
If having a sub-interface or marker interface is a more work, so be it. That it is possible to write code using List that performs bad, sure, we'll just have to educate people better by explaining the possible problems (and their solutions) in the docs.
To me having a processData(List) that doesn't take LinkedList would be extremely weird. If its developer is very sure that using the wrong List would cause grave performance issues than they could go for processData(IndexableList), but even then I can image as a user of that API that I would be miffed that I couldn't pass a LinkedList without a good reason: "It's my responsibility, I will decide what acceptable performance is for me". Sure, you can just pass ArrayList(myLinkedList), but that's just exchanging one problem (performance) for another (memory).
And if you are able to detect the difference between the two classes of Lists (the ones that work well indexed and the ones that should just be iterated over) you can choose between different implementations internally if you want. That way at least any performance issues people find can be fixed transparently.
To me having a processData(List) that doesn't take LinkedList would be extremely weird
Why? It would take an Iterable if it doesn't need a fast getFromFirst, and of course it would be a big mistake to use a LinkedList in this function if it does rely on getFromFirst.
And, why do you want to use LinkedList anyway? The conventional wisdom that LinkedLists are useful seems to be wrong.
The ability to quickly lookup by index is far and away the most important performance characteristic for lists, so it follows that there should be a type that can be used when this type of performance is required for a parameter.
But, anyway, there are lots of posts above saying we should just stick with what Java does. So fine. How do we resolve this issue? Have an AbstractSequentialList like Java does?
so it follows that there should be a type
Personally I'm not really sure type safety is something that should to be used to try to guarantee something about runtime performance.
The fact is that although using a large linked list as an argument to some of these functions might be very inefficient, it is not wrong (those algorithms will terminate eventually). If you try to use type safety to protect people from slow algorithms many people with short linked lists won't thank you.
And where does this end?
For reference Java has RandomAccess, which I've never seen used in any API (I assume it's only used internally when choosing between different possible algorithms for, e.g. sorting).
The ability to quickly lookup by index is far and away the most important performance characteristic for lists,
I just don't agree with that, that's what arrays are for, or specific List implementations. To me lists are only ever guaranteed to be accessible with acceptable performance by using Iterables. The Java docs for example specifically state:
The List interface provides four methods for positional (indexed) access to list elements. Lists (like Java arrays) are zero based. Note that these operations may execute in time proportional to the index value for some implementations (the LinkedList class, for example). Thus, iterating over the elements in a list is typically preferable to indexing through it if the caller does not know the implementation.
How do we resolve this issue?
Well, what's wrong with the marker interface solution? Like RandomAccess as @tombentley mentions.
It's of course the responsibility of the implementers to actually make use of it, but it seems a perfectly reasonable solution.
And where does this end?
It ends with this one change, being able to describe lists with array-like performance for reads.
But anyway, this was just one proposed solution to the problem of List assuming array-like performance in its implementation. And it seems quite unpopular. So, if this idea is no good, how do we resolve this issue?
I really don't like the idea that when you satisfy an interface, you should have to inspect all inherited default implementations to see if they apply to your subtype. As we've seen, this is error prone, and really, borders on violating the substitution principle.
It's of course the responsibility of the implementers to actually make use of it
well, if added for the purpose of resolving this issue, that wouldn't be a problem since List would lose all of its default implementations that don't work well for linked lists.
since List would lose all of it's default implementations that don't work well for linked lists.
Unless the default implementations implement this behaviour of choosing between different implementations already, right?
A list is fundamentally defined by its non-commutativity and non-idempotence properties, by opposition with bag, set and ordered-set. The fact that it's backed with array or node should be transparent.
If I need random access for performance reason, I will use an array or a specific list implementation that exposes random access methods.
I was defending my proposal. But I get it.
- People want
LinkedListto satisfyList, and - It's seen as "perfectly ok" that if you have
void processData(List)that requires requires the list be random-access, api documentation stating that requirement will suffice.
Like I said, this was just one proposal to fix the primary issue, which is that LinkedList has performance bugs, and that List has default members that make incorrect assumptions about subtypes. (Please note the distinction from Iterable's default members, which are unoptimized for some subclasses, but not actually wrong.)
We can certainly fix LinkedList by just overriding more of Lists default members (perhaps copy and past from Iterable). But I think it would be nice to do better for the reason I stated above.
perhaps copy and past from Iterable
Shouldn't it be possible to just call super when RandomAccess is not implemented?
Would that not be the same as moving the array-optimized default members to RandomAccess, and leaving behind =>super stubs in List (for binary compatibility)?
But either way, we'd be de-optimizing existing array based Lists, which I'm not sure is ok (ceylon.collection::ArrayList/1.3.0 would be slower).
But either way, we'd be de-optimizing existing array based Lists
I don't understand, how we'd be doing that?