coreutils
coreutils copied to clipboard
tsort refactoring proposal
This pull request is an attempt to rework the current tsort implementation in order to allow for better performance. I finally had some time to give it a shot after I mentionned it in #4931 with @tertsdiepraam
It also tries to mimic gnu tsort output.
The improvement is achieved by using dfs instead of khan's algorithm. We use a stack to detect a cycle and efficiently return a cycle.
This removes the need for the btree set as the nodes are only sorted once. The ordering is a little bit different from gnu tsort, however, it is correct.
I had to write a small utility method split_off_as_vec to make a more efficient use of memory while printing and iterating over the cycle. It drops the number of memory copies from 4 to 2 in the worst case (checked in assembly), which I believe is a good trade-off.
This is still in draft as I have not adjusted tests.
Please share any suggestions.
I think we should keep the gnu ordering. I value compatibility over speed
I think we should keep the gnu ordering. I value compatibility over speed
The thing is topological ordering isn't unique when it exists in most cases. If you rely on the order of a specific implementation, even an update to gnu can break your workflow.
Keeping the same order forces us to implement a solution that's 4x slower. The order in the provided solution is still mathematically correct.
If you switch to rust coreutils and you rely on implementation related details, you're in trouble anyway So I don't believe it's a compatibility issue in and of itself
The thing is topological ordering isn't unique when it exists in most cases. If you rely on the order of a specific implementation, even an update to gnu can break your workflow.
This depends on what that order is. For example, if it attempts to retain the order in which the nodes appear in the source, that would make sense. Could you give some examples of which cases differ from GNU?
Also, I'd love to see some benchmarks on this using hyperfine. Counting memory copies is a nice approximation, but actual measurements are even more helpful :)
Edit: I've done a measurement:
Benchmark 1: ./coreutils-main tsort in.txt
Time (mean ± σ): 2.428 s ± 0.100 s [User: 2.355 s, System: 0.066 s]
Range (min … max): 2.334 s … 2.684 s 10 runs
Benchmark 2: ./coreutils-new tsort in.txt
Time (mean ± σ): 2.658 s ± 0.092 s [User: 2.589 s, System: 0.063 s]
Range (min … max): 2.562 s … 2.824 s 10 runs
Benchmark 3: tsort in.txt
Time (mean ± σ): 517.4 ms ± 5.6 ms [User: 501.2 ms, System: 12.6 ms]
Range (min … max): 511.5 ms … 528.9 ms 10 runs
Summary
tsort in.txt ran
4.69 ± 0.20 times faster than ./coreutils-main tsort in.txt
5.14 ± 0.19 times faster than ./coreutils-new tsort in.txt
Looks like this is not yet yielding the results you were hoping for and you actually made it a bit slower.
The thing is topological ordering isn't unique when it exists in most cases. If you rely on the order of a specific implementation, even an update to gnu can break your workflow.
This depends on what that order is. For example, if it attempts to retain the order in which the nodes appear in the source, that would make sense. Could you give some examples of which cases differ from GNU?
well, the test case with the dependencies is such an example. My implementation sorts the nodes alphabetically before running tsort.
Benchmark 1: ./coreutils-main tsort in.txt
Time (mean ± σ): 2.428 s ± 0.100 s [User: 2.355 s, System: 0.066 s]
Range (min … max): 2.334 s … 2.684 s 10 runs
Benchmark 2: ./coreutils-new tsort in.txt
Time (mean ± σ): 2.658 s ± 0.092 s [User: 2.589 s, System: 0.063 s]
Range (min … max): 2.562 s … 2.824 s 10 runs
Benchmark 3: tsort in.txt
Time (mean ± σ): 517.4 ms ± 5.6 ms [User: 501.2 ms, System: 12.6 ms]
Range (min … max): 511.5 ms … 528.9 ms 10 runs
Summary
tsort in.txt ran
4.69 ± 0.20 times faster than ./coreutils-main tsort in.txt
5.14 ± 0.19 times faster than ./coreutils-new tsort in.txt
Looks like this is not yet yielding the results you were hoping for and you actually made it a bit slower.
this is still a WIP. I'm trying to get better performance indeed. Can you share the in.txt file so that we have a common bench?
Sure, I've generated the input with this:
import random
N = 10000
for i in range(100*N):
a = random.randint(0,N)
b = random.randint(0,N)
if a != b:
print(f"{min(a, b)} {max(a, b)}")
@tertsdiepraam I hope you're both doing well. I've been quite busy lately but have managed to make significant progress on the refactor I've been working on.
The code still passes all tests and shows a substantial performance improvement, being up to 3 times faster than the current implementation in almost all cases I have tested. I didn't have the time to document it sadly, put simply, I've used another algorithm that leverages the tree nature of the graph in a better way, ensuring less copies and sorting.
I believe we can potentially achieve even greater speed improvements by switching from a BTree to a Red-Black tree.
I am currently unable to continue this PR, so I hope someone can reuse my work :)
Remind me on Tuesday! I'd love to take a look then
GNU testsuite comparison:
Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)
Remind me on Tuesday! I'd love to take a look then @tertsdiepraam There you go.
For reference, the proposed code in the first commit is approximately 1.5 times slower than GNU tsort. (still a 2x improvement)
Benchmark 1: target/release/coreutils tsort terts_10000_seed0
Time (mean ± σ): 763.7 ms ± 12.9 ms [User: 736.1 ms, System: 22.8 ms]
Range (min … max): 742.4 ms … 781.6 ms 10 runs
Benchmark 2: tsort terts_10000_seed0
Time (mean ± σ): 509.4 ms ± 4.2 ms [User: 485.3 ms, System: 21.1 ms]
Range (min … max): 503.0 ms … 514.9 ms 10 runs
Summary
tsort terts_10000_seed0 ran
1.50 ± 0.03 times faster than target/release/coreutils tsort terts_10000_seed0
The last commit is a work in progress and serves as a reference to illustrate that the performance cost is mainly driven by the data structure used for nodes. Switching to a hashmap and some manual sort results in a topological order very close to GNU tsort (though it still fails in some cases).
Benchmark 1: target/release/coreutils tsort terts_10000_seed0
Time (mean ± σ): 222.5 ms ± 4.5 ms [User: 202.3 ms, System: 19.0 ms]
Range (min … max): 215.8 ms … 229.7 ms 13 runs
Benchmark 2: tsort terts_10000_seed0
Time (mean ± σ): 511.4 ms ± 6.5 ms [User: 490.1 ms, System: 19.3 ms]
Range (min … max): 500.5 ms … 519.5 ms 10 runs
Summary
target/release/coreutils tsort terts_10000_seed0 ran
2.30 ± 0.05 times faster than tsort terts_10000_seed0
The proposed implementation uses Knuth's Algorithm T (TAOCP vol. 1) for topological sorting.
This algorithm works by first listing all nodes and counting how many nodes each one depends on (its predecessors). nodes with no predecessors are added to a queue. We then take the node at the front of the queue, remove it from the graph and append it to the result. Any nodes depending on it have their predecessor count reduced.
If any of those nodes now have no predecessors, they are added at the back of the queue. This continues until the whole queue is processed. If any nodes remain with predecessors at the end, it means there's a cycle, which we find with a DFS on the remaining subgraph.
@tertsdiepraam Thanks for the review, I've addressed your comments and clarified the commit. message.
I also preallocated result with the nmbers of nodes in the graph in order to make it cleaner.
GNU testsuite comparison:
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
@sylvestre What do you think about this PR?
GNU testsuite comparison:
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
GNU testsuite comparison:
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
GNU testsuite comparison:
Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)
GNU testsuite comparison:
GNU test failed: tests/timeout/timeout. tests/timeout/timeout is passing on 'main'. Maybe you have to rebase?
Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)
@tertsdiepraam ding dong!
Sorry I don't have access to a computer this week. Ill try to look at it soonish