btdu
btdu copied to clipboard
Accidental O(N^2) behaviour?
This is somewhat academic since you clearly get very good resolution with a low number of samples, but maybe there are non-academic issues lurking you might be interested in.
I observe that if I double the samples, it takes approx 4x as long to generate an export. Is there some accidental N^2 complexity somewhere? I don't think this is necessary a big deal (so feel free to close), but thought you might like the data:
Note I'm using handwavy order-of-magnitude math but I might have expected the latter to take (5.3s x 4 = ) ~20s rather than ~40s.
Collected 237700 samples (achieving a resolution of ~9.999 MiB) in 5 secs, 143 ms, 833 μs, and 4 hnsecs.
real 0m5.338s / user 0m5.083s / sys 0m0.198s
Collected 475616 samples (achieving a resolution of ~4.997 MiB) in 16 secs, 432 ms, and 132 μs.
real 0m16.710s / user 0m16.259s / sys 0m0.420s
Collected 950729 samples (achieving a resolution of ~2.500 MiB) in 45 secs, 119 ms, 284 μs, and 8 hnsecs.
real 0m45.520s / user 0m44.655s / sys 0m0.854s
X axis: # samples Y axis: Time per sample in microseconds (i.e, it's getting more costly per sample)
Is it just export? If you leave out -o, is it more linear?
Behaviour is the same without export.
If I double the samples, I see 3.2x the number of cycles spent on a single 'test' instruction in _D4btdu5paths7SubPath8__mixin2__T10appendNameVbi0ZQrMFIAaZPSQCgQCeQCb.
On a filesystem with a very small number of files, I expect performance to be linear.
On a filesystem very many files (e.g. one hypothetical extreme being that every sample will be in a never-seen-before file), then performance should be very close to O(n^2). This is because siblings are stored as a linked list.
At some point siblings were stored as a hash table, however I swapped that out because keeping a hash table on every node had a significant overhead, and memory use seems to be the dominating side of performance pressure right now for typical filesystems.
Maybe we can try something with less overhead but still better than linear, such as a red-black tree.
Happy to provide test results if you want to experiment, but I don't consider this too important.
@CyberShadow Do you want to close this one also or are you considering a tweak?
I think it's actionable.