bibtex-autocomplete icon indicating copy to clipboard operation
bibtex-autocomplete copied to clipboard

How to read progress?

Open homocomputeris opened this issue 1 year ago • 5 comments

What does the percentage mean? In progress bar the same as %? What is btac doing when stuck?

Screenshot From 2024-09-29 16-07-17

homocomputeris avatar Sep 29 '24 14:09 homocomputeris

EDIT: This is no longer true, the progress bar was changed in version 1.4.0. See below for new behavior.

The progress bar is simply (number of completed queries) / (total number of queries). Each entry is queried once per source, so the total number of queries is number of entries * 8 unless you restrict the source list with the -q or -Q flags. The percentage is the same as the progress bar.

There is some bias here though, if sources respond at different rates (say most are fast but one is slow), then you will complete the progress bar up to 7/8 quickly, but still have to wait for the slow source's queries to complete. You can use the verbose (-v) mode to see how many queries are completed per source. If you notice a source taking too long, you can:

  • exclude it with -Q
  • retry at another time
  • set a stricter timeout with -t <n>, the default is 20s per query, meaning btac might take up to 40s * number of entries to run (it can query entries twice in some circumstances).

An entry is only processed when all queries related to it have completed, which is why the entry counter is lower than the progress bar.

Unfortunately, the ETA isn't very reliable, as it expects the progress bar to update at a constant rate, and not slow down at the end when the fast sources are done.

btac really should not remain stuck more than 40s with the default timeout (although if you have a lot of entries, an increment of one query might not be noticeable on the progress bar).

dlesbre avatar Sep 29 '24 15:09 dlesbre

btac really should not remain stuck more than 40s with the default timeout (although if you have a lot of entries, an increment of one query might not be noticeable on the progress bar).

Unfortunately, I've seen it not being able to finish in 10 hours, hence the question. It wrote something like 200/500 entries, and then for hours the counted did not increase.

homocomputeris avatar Sep 29 '24 23:09 homocomputeris

Oh I just remembered there is another source of delays: rate limits. Sometimes an API response will request btac wait before the next query to reduce server load. This request can specify any delay, often its a few seconds but there is nothing stopping them from requesting hours... In that case btac will patiently wait for the given delay before performing its next query...

Note that these rate limits are often IP based, so using a VPN might increase them significantly if other VPN users are querying the same API.

I should probably add some smarter logic to btac to allow it to end faster. Something like skipping queries if the delay gets too long or if the source is too slow compared to all the others.

dlesbre avatar Sep 30 '24 07:09 dlesbre

Thanks for the explanation!

homocomputeris avatar Sep 30 '24 13:09 homocomputeris

I've changed the progress bar in version 1.4.0 since it was causing confusion. It now only counts completed entries and not queries. As a result the progress should be more predictable.

BTAC will also now skip queries if some sources are lagging behind when 2/3rds of the others have finished. This should improve performance a lot when a single source is overloaded and takes a lot longer to respond. (If needed, this can be disabled with the --ns --no-skip flag).

As a result of these changes the progress bar ETA should now be an over-approximation of the time required, rather than an under-approximation as it was before.

dlesbre avatar Oct 28 '24 13:10 dlesbre