coreutils tsort refactoring proposal

trafficstars

This pull request is an attempt to rework the current tsort implementation in order to allow for better performance. I finally had some time to give it a shot after I mentionned it in #4931 with @tertsdiepraam

It also tries to mimic gnu tsort output.

The improvement is achieved by using dfs instead of khan's algorithm. We use a stack to detect a cycle and efficiently return a cycle.

This removes the need for the btree set as the nodes are only sorted once. The ordering is a little bit different from gnu tsort, however, it is correct.

I had to write a small utility method split_off_as_vec to make a more efficient use of memory while printing and iterating over the cycle. It drops the number of memory copies from 4 to 2 in the worst case (checked in assembly), which I believe is a good trade-off.

This is still in draft as I have not adjusted tests.

Please share any suggestions.

Feb 11 '24 07:02 anastygnome

I think we should keep the gnu ordering. I value compatibility over speed

Feb 11 '24 08:02 sylvestre

I think we should keep the gnu ordering. I value compatibility over speed

The thing is topological ordering isn't unique when it exists in most cases. If you rely on the order of a specific implementation, even an update to gnu can break your workflow.

Keeping the same order forces us to implement a solution that's 4x slower. The order in the provided solution is still mathematically correct.

If you switch to rust coreutils and you rely on implementation related details, you're in trouble anyway So I don't believe it's a compatibility issue in and of itself

Feb 11 '24 09:02 anastygnome

The thing is topological ordering isn't unique when it exists in most cases. If you rely on the order of a specific implementation, even an update to gnu can break your workflow.

This depends on what that order is. For example, if it attempts to retain the order in which the nodes appear in the source, that would make sense. Could you give some examples of which cases differ from GNU?

Also, I'd love to see some benchmarks on this using hyperfine. Counting memory copies is a nice approximation, but actual measurements are even more helpful :)

Edit: I've done a measurement:

Benchmark 1: ./coreutils-main tsort in.txt
  Time (mean ± σ):      2.428 s ±  0.100 s    [User: 2.355 s, System: 0.066 s]
  Range (min … max):    2.334 s …  2.684 s    10 runs
 
Benchmark 2: ./coreutils-new tsort in.txt
  Time (mean ± σ):      2.658 s ±  0.092 s    [User: 2.589 s, System: 0.063 s]
  Range (min … max):    2.562 s …  2.824 s    10 runs
 
Benchmark 3: tsort in.txt
  Time (mean ± σ):     517.4 ms ±   5.6 ms    [User: 501.2 ms, System: 12.6 ms]
  Range (min … max):   511.5 ms … 528.9 ms    10 runs
 
Summary
  tsort in.txt ran
    4.69 ± 0.20 times faster than ./coreutils-main tsort in.txt
    5.14 ± 0.19 times faster than ./coreutils-new tsort in.txt

Looks like this is not yet yielding the results you were hoping for and you actually made it a bit slower.

Feb 11 '24 09:02 tertsdiepraam

The thing is topological ordering isn't unique when it exists in most cases. If you rely on the order of a specific implementation, even an update to gnu can break your workflow.

This depends on what that order is. For example, if it attempts to retain the order in which the nodes appear in the source, that would make sense. Could you give some examples of which cases differ from GNU?

well, the test case with the dependencies is such an example. My implementation sorts the nodes alphabetically before running tsort.

 Benchmark 1: ./coreutils-main tsort in.txt
   Time (mean ± σ):      2.428 s ±  0.100 s    [User: 2.355 s, System: 0.066 s]
 Range (min … max):    2.334 s …  2.684 s    10 runs 
 Benchmark 2: ./coreutils-new tsort in.txt
  Time (mean ± σ):      2.658 s ±  0.092 s    [User: 2.589 s, System: 0.063 s]
   Range (min … max):    2.562 s …  2.824 s    10 runs

 Benchmark 3: tsort in.txt
   Time (mean ± σ):     517.4 ms ±   5.6 ms    [User: 501.2 ms, System: 12.6 ms]
   Range (min … max):   511.5 ms … 528.9 ms    10 runs
  
 Summary
   tsort in.txt ran
     4.69 ± 0.20 times faster than ./coreutils-main tsort in.txt
     5.14 ± 0.19 times faster than ./coreutils-new tsort in.txt

Looks like this is not yet yielding the results you were hoping for and you actually made it a bit slower.

this is still a WIP. I'm trying to get better performance indeed. Can you share the in.txt file so that we have a common bench?

Feb 11 '24 09:02 anastygnome

Sure, I've generated the input with this:

import random

N = 10000
for i in range(100*N):
    a = random.randint(0,N)
    b = random.randint(0,N)
    if a != b:
        print(f"{min(a, b)} {max(a, b)}")

Feb 11 '24 09:02 tertsdiepraam

@tertsdiepraam I hope you're both doing well. I've been quite busy lately but have managed to make significant progress on the refactor I've been working on.

The code still passes all tests and shows a substantial performance improvement, being up to 3 times faster than the current implementation in almost all cases I have tested. I didn't have the time to document it sadly, put simply, I've used another algorithm that leverages the tree nature of the graph in a better way, ensuring less copies and sorting.

I believe we can potentially achieve even greater speed improvements by switching from a BTree to a Red-Black tree.

I am currently unable to continue this PR, so I hope someone can reuse my work :)

Jun 23 '24 09:06 anastygnome

Remind me on Tuesday! I'd love to take a look then

Jun 23 '24 09:06 tertsdiepraam

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)

Jun 24 '24 21:06 github-actions[bot]

Remind me on Tuesday! I'd love to take a look then @tertsdiepraam There you go.

For reference, the proposed code in the first commit is approximately 1.5 times slower than GNU tsort. (still a 2x improvement)

Benchmark 1: target/release/coreutils tsort terts_10000_seed0
  Time (mean ± σ):     763.7 ms ±  12.9 ms    [User: 736.1 ms, System: 22.8 ms]
  Range (min … max):   742.4 ms … 781.6 ms    10 runs
 
Benchmark 2: tsort terts_10000_seed0
  Time (mean ± σ):     509.4 ms ±   4.2 ms    [User: 485.3 ms, System: 21.1 ms]
  Range (min … max):   503.0 ms … 514.9 ms    10 runs
 
Summary
  tsort terts_10000_seed0 ran
    1.50 ± 0.03 times faster than target/release/coreutils tsort terts_10000_seed0

The last commit is a work in progress and serves as a reference to illustrate that the performance cost is mainly driven by the data structure used for nodes. Switching to a hashmap and some manual sort results in a topological order very close to GNU tsort (though it still fails in some cases).

Benchmark 1: target/release/coreutils tsort terts_10000_seed0
  Time (mean ± σ):     222.5 ms ±   4.5 ms    [User: 202.3 ms, System: 19.0 ms]
  Range (min … max):   215.8 ms … 229.7 ms    13 runs
 
Benchmark 2: tsort terts_10000_seed0
  Time (mean ± σ):     511.4 ms ±   6.5 ms    [User: 490.1 ms, System: 19.3 ms]
  Range (min … max):   500.5 ms … 519.5 ms    10 runs
 
Summary
  target/release/coreutils tsort terts_10000_seed0 ran
    2.30 ± 0.05 times faster than tsort terts_10000_seed0

The proposed implementation uses Knuth's Algorithm T (TAOCP vol. 1) for topological sorting.

This algorithm works by first listing all nodes and counting how many nodes each one depends on (its predecessors). nodes with no predecessors are added to a queue. We then take the node at the front of the queue, remove it from the graph and append it to the result. Any nodes depending on it have their predecessor count reduced.

If any of those nodes now have no predecessors, they are added at the back of the queue. This continues until the whole queue is processed. If any nodes remain with predecessors at the end, it means there's a cycle, which we find with a DFS on the remaining subgraph.

Jun 25 '24 03:06 anastygnome

@tertsdiepraam Thanks for the review, I've addressed your comments and clarified the commit. message.

I also preallocated result with the nmbers of nodes in the graph in order to make it cleaner.

Jun 29 '24 12:06 anastygnome

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

Jun 29 '24 12:06 github-actions[bot]

@sylvestre What do you think about this PR?

Jun 29 '24 13:06 anastygnome

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

Jun 29 '24 13:06 github-actions[bot]

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

Jun 29 '24 17:06 github-actions[bot]

GNU testsuite comparison:

Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)

Jun 30 '24 14:06 github-actions[bot]

GNU testsuite comparison:

GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

Jul 05 '24 07:07 github-actions[bot]

@tertsdiepraam ding dong!

Jul 23 '24 11:07 anastygnome

Sorry I don't have access to a computer this week. Ill try to look at it soonish

Jul 23 '24 11:07 tertsdiepraam

coreutils coreutils copied to clipboard

tsort refactoring proposal

coreutils
coreutils copied to clipboard