noseyparker
noseyparker copied to clipboard
Revise showing ETA and rates for progress bars
The progress bar comment mentions an issue with the indicatif
estimation algorithm for ETA and the throughput. The PR linked in the comment with the fix for the issue was merged. It might be worth revisiting if ETA and rates can be reenabled.
Yes, good idea! It would be more user-friendly to give meaningful ETAs and show current scan rate.
Longer-term, I've been thinking about switching away from Indicatif entirely and using something lighter weight, like maybe status_line
. I had noticed earlier on in the development of Nosey Parker that in parallel scan jobs, there was noticeable overhead just from updating the indicatif progress bar, and so Nosey Parker has resorted to some dirty hacks that to minimize that.
There is also an issue now with an interaction between the log
, tracing
, and indicatif
crates: if a progress bar is active, any log messages will mess up how it is rendered. Probably the right way to fix that is by somehow modifying the global noseyparker
log setup code. Log messages shouldn't interfere with any active progress bar, and this shouldn't require modifying every single call site of log
or tracing
functions.
Also longer-term, I think Nosey Parker will need to move away from its current scanning operation of (1) enumerating inputs to scan to determine progress bar maximum and then (2) scanning all those inputs.
This current mode of scan operation is problematic for a few reasons:
- It requires two accesses of the filesystem, and there is time between them in which the filesystem could change. The total disk IO might be close to 2x as high as is necessary
- It induces a blocking delay before scanning can begin: all the inputs need to be counted
- It assumes that you can fully enumerate the inputs! This might not be feasible in the case of huge filesystem shares, for example. In such cases it would be beneficial if Nosey Parker could still do a partial scan.
- It makes it difficult to implement an alternative scanning setup, such as a hypothetical Nosey Parker "server" mode where it would stay running until shut down, endlessly fulfilling requests to scan inputs sent to it
Without the "count all the inputs in advance" step, it may be impossible to provide an ETA, as Nosey Parker wouldn't have an idea how many inputs remained until it had actually scanned everything. (In this scenario, scan throughput rate and a total count of things scanned could still be reported, however.)
See also #46.
I had noticed earlier on in the development of Nosey Parker that in parallel scan jobs, there was noticeable overhead just from updating the indicatif progress bar, and so Nosey Parker has resorted to some dirty hacks that to minimize that.
I looked at indicatif documentation, wouldn't enable_steady_tick
alleviate that issue? If it's enabled, the normal operations such as inc
won't progress the bar.