toil toil stats is way slow

this is on a smalish cactus run (10 mammals)

I``` NFO:toil.utils.toilStatus:Traversing the job graph gathering jobs. This may take a couple of minutes. Of the 145093 jobs considered, there are 8627 jobs with children, 135948 jobs ready to run, 518 zombie jobs, 0 jobs with services, 0 services, and 0 jobs with log files currently in FileJobStore(/lustre/scratch/mdiekhan/cactus/debug/run/jobStore).

real 83m12.232s user 0m47.406s sys 2m12.397s




┆Issue is synchronized with this [Jira Story](https://ucsc-cgl.atlassian.net/browse/TOIL-571)
┆Epic: Improve debugging experience
┆Issue Number: TOIL-571

Jun 19 '20 02:06 diekhans

That's a rate of 1 job processed every 300 ms, which seems pretty fast if there's any networking involved. How fast is this filesystem?

We might need to move over to some kind of async IO here to up throughput when there's latency involved in IO requests.

On 6/18/20, Mark Diekhans [email protected] wrote:

this is on a smalish cactus run (10 mammals)

I``` NFO:toil.utils.toilStatus:Traversing the job graph gathering jobs. This may take a couple of minutes. Of the 145093 jobs considered, there are 8627 jobs with children, 135948 jobs ready to run, 518 zombie jobs, 0 jobs with services, 0 services, and 0 jobs with log files currently in FileJobStore(/lustre/scratch/mdiekhan/cactus/debug/run/jobStore).

real 83m12.232s user 0m47.406s sys 2m12.397s
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/DataBiosphere/toil/issues/3089

Jun 19 '20 18:06 adamnovak

Also look at the user and sys times. That real time is almost all burned waiting on IO.

On 6/19/20, Adam Novak [email protected] wrote:

That's a rate of 1 job processed every 300 ms, which seems pretty fast if there's any networking involved. How fast is this filesystem?

We might need to move over to some kind of async IO here to up throughput when there's latency involved in IO requests.

On 6/18/20, Mark Diekhans [email protected] wrote:
this is on a smalish cactus run (10 mammals)

I``` NFO:toil.utils.toilStatus:Traversing the job graph gathering jobs. This may take a couple of minutes. Of the 145093 jobs considered, there are 8627 jobs with children, 135948 jobs ready to run, 518 zombie jobs, 0 jobs with services, 0 services, and 0 jobs with log files currently in FileJobStore(/lustre/scratch/mdiekhan/cactus/debug/run/jobStore).

real 83m12.232s user 0m47.406s sys 2m12.397s
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/DataBiosphere/toil/issues/3089

Jun 19 '20 18:06 adamnovak

file system is luster

with the state splatted on the file system, I doubt if it is worth trying to speed it up.

maybe just change the message to say: This may take a couple of hours.

Adam Novak [email protected] writes:

That's a rate of 1 job processed every 300 ms, which seems pretty fast if there's any networking involved. How fast is this filesystem?

We might need to move over to some kind of async IO here to up throughput when there's latency involved in IO requests.

Jun 19 '20 20:06 diekhans

Probably we want to say it will take a certain amount of time per job, or "a while".

On 6/19/20, Mark Diekhans [email protected] wrote:

file system is luster

with the state splatted on the file system, I doubt if it is worth trying to speed it up.

maybe just change the message to say: This may take a couple of hours.

Adam Novak [email protected] writes:

That's a rate of 1 job processed every 300 ms, which seems pretty fast if there's any networking involved. How fast is this filesystem?

We might need to move over to some kind of async IO here to up throughput when there's latency involved in IO requests.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/DataBiosphere/toil/issues/3089#issuecomment-646844149

Jun 19 '20 23:06 adamnovak