fuzion AoC: Repeated list concatenation `l ++ [x]` has performance O(n²)

I ran into this during part 1 of the dec 9 Advent of Code.

This is a well known problem in Haskell Avoiding ++ in Haskell or Why are difference lists more efficient than regular concatenation in Haskell?.

Here is a benchmark that compares the performance of appending single elements to the beginning vs. to the end of a list

_ := for sz := u64 1024, sz |> *2 do

       test(op (Sequence i32)->(Sequence i32)) =>
         start := time.nano.read
         build(s Sequence i32, n u64) Sequence i32 => n = 0 ? s : op (build s n-1)
         l := build [] sz
         cnt := l.count
         delta := time.nano.read-start
         (cnt, delta)

       cnt1, delta1 := test ([1]++)
       cnt2, delta2 := test (++[1])

       check
         sz.as_i32 = cnt1 = cnt2

       out(d) => "{d} {time.duration.ns d.nanos/sz}/it {time.duration.ns d.nanos/sz/sz}/it²"
       say "$sz:\t [1]++l: {out delta1} | l++[1]: {out delta2} factor: {delta2.nanos/delta1.nanos}"
     until (max delta1 delta2 >= time.duration.seconds 5)

The performance of the former is linear, that of the latter quadratic:

 > ~/fuzion/work/build/bin/fz test_perf.fz 
1024:	 [1]++l:   24ms   24µs/it   23ns/it² | l++[1]:   46ms   45µs/it   44ns/it² factor: 1
2048:	 [1]++l: 2081µs 1016ns/it    0ns/it² | l++[1]:   77ms   38µs/it   18ns/it² factor: 37
4096:	 [1]++l: 4297µs 1049ns/it    0ns/it² | l++[1]:  184ms   45µs/it   11ns/it² factor: 42
8192:	 [1]++l: 3284µs  400ns/it    0ns/it² | l++[1]:  821ms  100µs/it   12ns/it² factor: 250
16384:	 [1]++l: 1781µs  108ns/it    0ns/it² | l++[1]: 2246ms  137µs/it    8ns/it² factor: 1260
32768:	 [1]++l: 3058µs   93ns/it    0ns/it² | l++[1]: 8150ms  248µs/it    7ns/it² factor: 2664

There is probably no easy solution to this, but it should be ensured that this problem is well documented and that alternatives are explained. What are the standard alternatives suggested in other languages like Haskell? One solution would be to replace appending at the end by appending at the beginning followed by a call to Sequence.reverse.

Dec 10 '24 10:12 fridis

This problem exists in Java as well: building a string as follows

var s := ""
for(i=0, i<1000, i++)
  s := s + i + " "

also has performance in O(n²). The Java solution is StringBuilder and, for lists, ArrayList. So one solution for Fuzion would be to offer some string_builder and sequence_builder features. The tricky part is that this will require either a local mutate instance or some dedicated global builder mutate that is used only in builders. It would be nice to implement it in a way that handling of this mutate instance is transparent, e.g. (with implicit sequence as result)

l := (Sequence i32).build b->
    build(n u64) => if n > 0 then b.yield 1; build s n-1
    build sz

or (with explicit result created via b.finish

l := (Sequence i32).build b->
    build(n u64) => n=0 ? b.finish : {b.yield 1; build s n-1}
    build sz

It is important that these builders can be nested without interfering with one another as in

(even,odd) := 
    (Sequence u64).build e->
      (Sequence u64).build o->
        build(n u64) => n=0 ? (e.finish,o.finish) 
                            : {(n%%2 ? e : o).yield n; build s n-1}
        build sz

Dec 20 '24 08:12 fridis

Personally I would prefer to not just document this. I have a hard time imagining that people will not stumble over this again and again.

Random thoughts:

would be cool to give a guarantee regarding run-time complexity of base.fum to never be O(n²) or worse.
or to require some effect to enable features with worse runtime complexity

Jan 01 '25 09:01 michaellilltokiwa

Just came across this post mentioning immutable dequeue. This might provide a solution if a ++ b would, for finite sequences, make sure to first convert the longer sequence into into an immutable dequeue and then push the elements of the shorter sequence to that one. Tricky part might be how to save the elements efficiently, looks as if my ps_array could help here...

Jan 04 '25 21:01 fridis

This is not fixed yet. #4651 provides some infrastructure that maybe provides a solution, but I would like the example to run fast or the user to be guided such that this performance issue is best avoided.

Jan 28 '25 16:01 fridis

I get a stackoverflow when running this.

make: Nothing to be done for 'no-java'.
1024:    [1]++l:  118ms  115µs/it  112ns/it² | l++[1]:   40ms   39µs/it   38ns/it² factor: 0
2048:    [1]++l: 3963µs 1935ns/it    0ns/it² | l++[1]:   13ms 6495ns/it    3ns/it² factor: 3
4096:    [1]++l: 5114µs 1248ns/it    0ns/it² | l++[1]:   17ms 4254ns/it    1ns/it² factor: 3
8192:    [1]++l:   10ms 1288ns/it    0ns/it² | l++[1]:   32ms 4026ns/it    0ns/it² factor: 3
16384:   [1]++l:   59ms 3655ns/it    0ns/it² | l++[1]:   48ms 2946ns/it    0ns/it² factor: 0

error 1: java.lang.StackOverflowError
        at fzC_1INTERN_loop0__1test__2build.fzRoutine(/home/sam/playground/test.fz:73)
        at fzC_1INTERN_loop0__1test__2build.fzRoutine(/home/sam/playground/test.fz:73)
...

works when using loop instead of recursion in the build feature:

_ := for sz := u64 1024, sz |> *2 do

       test(op (Sequence i32)->(Sequence i32)) =>
         start := time.nano.read
         build(s Sequence i32, n u64) Sequence i32 =>
          for cn := n, cn-1
              res := s, op res
          while cn != 0
          else
            res
         l := build [] sz
         cnt := l.count
         delta := time.nano.read-start
         (cnt, delta)

       cnt1, delta1 := test ([1]++)
       cnt2, delta2 := test (++[1])

       check
         sz.as_i32 = cnt1 = cnt2

       out(d) => "{d} {time.duration.ns d.nanos/sz}/it {time.duration.ns d.nanos/sz/sz}/it²"
       say "$sz:\t [1]++l: {out delta1} | l++[1]: {out delta2} factor: {delta2.nanos/delta1.nanos}"
     until (max delta1 delta2 >= time.duration.seconds 5)
``

Oct 16 '25 14:10 michaellilltokiwa