motoko-base New naming convention to avoid performance pitfalls.

Problem

Despite having a deprecated status, Array.append continues to be used in new Motoko code by IC developers. In nearly all cases, these uses are inappropriate, and the array should in fact be a Buffer.

Why? Because usually, there is a need to add elements to the array based on control from a Motoko canister's public point. Almost always, the array growth in these ad-hoc uses is not efficient. Rather, it happens more often than necessary, and by only one element at a time. By contrast, the Buffer class does this array growth efficiently (doubling the array, not growing by a single element), and it encapsulates its efficient growth behavior for reuse elsewhere. How can we encourage using Buffer in place of Array in all of these cases?

Currently, knowing whether to use Array or Buffer is not as clear from the documentation as it could be.

Proposal

To help, we could adopt a naming convention for the base library that puts the contrast into the names, directly.

If a function is worst-case O(1) or O(log n), than that function or method is named in the current way.
If a function is worst-case O(n) or worse (or simply not known to be O(1) or O(log n)), then the function name ends with the special suffix that we pick, such as "_expensive" (or _highcost, or _slow, or _heavy, or _steep etc)

Under this convention, many operations in base library would not change their names, but in all cases, we would confront the time and space complexity of every function or method when we name it, and we'd be exposing this analysis to the public API, which is lacking today but seems very important going forward.

The function Array.append would become Array.append_slow (or Array.append_expensive, etc) and that would help avoid code that overuses it, and runs into inevitable latent performance issues later.

Jun 21 '22 15:06 matthewhammer

They should be getting a warning every time they use Array.append now describing the complexity. I don't know why they don't heed it.

Jun 21 '22 17:06 crusso

I don't know why they don't head it.

Well, sometimes I do need to glue exactly two arrays coming from different sources, like e.g., when I compute AccountId for the ICP ledger. https://github.com/dfinity/ledger-ref/blob/3ed0206afd231b9586c9e7b9a793948f10118297/src/Account.mo#L37

Jun 21 '22 17:06 roman-kashitsyn

sometimes I do need to glue exactly two arrays coming from different sources

Precisely! It is an operation that "closes" a need that naturally arises. We need it and should not remove it.

Unfortunately, it's also easily misused to insert a single element, or a small number of them, into a gigantic array that is stored in an actor variable, not some temporary, as in @roman-kashitsyn's legit use case.

They should be getting a warning every time they use Array.append now describing the complexity. I don't know why they don't head it.

It's actually unfortunate, in my opinion, that this warning is even necessary.

A nice side benefit to this proposal is that we could remove the warning!

Jun 21 '22 18:06 matthewhammer

They should be getting a warning every time they use Array.append now describing the complexity. I don't know why they don't head it.

In my case the code was provided as a template for an EXT-compliant NFT canister. I did see the warning during the build and proceeded to ignore it because I was on a time crunch. I had assumed that leaving the code unchanged was the safest choice given that I was new to the language and there were already dozens of other NFT canisters running this code today.

I made a bad call 😅

Jun 21 '22 20:06 lightninglad91

I made a bad call 😅

For what it's worth, following the path of least resistance is often the right call, and often feels most natural. In any case, it's more than understandable.

It's really too bad that this existing path was one with arrays and/or Array.append, and it seems like that choice had already been made for you.

In my case the code was provided as a template for an EXT-compliant NFT canister.

Where was that? Can we add a note there?

I know there are some pathological uses of arrays still in the official Motoko examples. Our team is working on fixing those asap.

If you saw other places, I want to help spread the word about Array.append, and similar performance pitfalls and help people get out of them or avoid them altogether. Any tips for example code to revise are appreciated!

Jun 22 '22 23:06 matthewhammer

Where was that? Can we add a note there?

Toniq Labs provides the template upon request. It's not available via public repo (yet). It might be worthwhile to caution them against further proliferation of Array.append. I keep Bob Bodily updated on the status of our canister, so if I see a significant improvement after moving to a Buffer I will be sure the let him know.

If you saw other places, I want to help spread the word about Array.append, and similar performance pitfalls and help people get out of them or avoid them altogether.

I will do my best to help!

Jun 23 '22 13:06 lightninglad91

@matthewhammer I agree the issue with append is the function name, but instead that it needs a more appropriate name (and not an addon).

If append had been called merge or concat from the beginning, I doubt people would be using it the same way.

Oct 24 '22 07:10 ByronBecker