toil stats is way slow
this is on a smalish cactus run (10 mammals)
I``` NFO:toil.utils.toilStatus:Traversing the job graph gathering jobs. This may take a couple of minutes. Of the 145093 jobs considered, there are 8627 jobs with children, 135948 jobs ready to run, 518 zombie jobs, 0 jobs with services, 0 services, and 0 jobs with log files currently in FileJobStore(/lustre/scratch/mdiekhan/cactus/debug/run/jobStore).
real 83m12.232s user 0m47.406s sys 2m12.397s
┆Issue is synchronized with this [Jira Story](https://ucsc-cgl.atlassian.net/browse/TOIL-571)
┆Epic: Improve debugging experience
┆Issue Number: TOIL-571
That's a rate of 1 job processed every 300 ms, which seems pretty fast if there's any networking involved. How fast is this filesystem?
We might need to move over to some kind of async IO here to up throughput when there's latency involved in IO requests.
On 6/18/20, Mark Diekhans [email protected] wrote:
this is on a smalish cactus run (10 mammals)
I``` NFO:toil.utils.toilStatus:Traversing the job graph gathering jobs. This may take a couple of minutes. Of the 145093 jobs considered, there are 8627 jobs with children, 135948 jobs ready to run, 518 zombie jobs, 0 jobs with services, 0 services, and 0 jobs with log files currently in FileJobStore(/lustre/scratch/mdiekhan/cactus/debug/run/jobStore).
real 83m12.232s user 0m47.406s sys 2m12.397s
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/DataBiosphere/toil/issues/3089
Also look at the user and sys times. That real time is almost all burned waiting on IO.
On 6/19/20, Adam Novak [email protected] wrote:
That's a rate of 1 job processed every 300 ms, which seems pretty fast if there's any networking involved. How fast is this filesystem?
We might need to move over to some kind of async IO here to up throughput when there's latency involved in IO requests.
On 6/18/20, Mark Diekhans [email protected] wrote:
this is on a smalish cactus run (10 mammals)
I``` NFO:toil.utils.toilStatus:Traversing the job graph gathering jobs. This may take a couple of minutes. Of the 145093 jobs considered, there are 8627 jobs with children, 135948 jobs ready to run, 518 zombie jobs, 0 jobs with services, 0 services, and 0 jobs with log files currently in FileJobStore(/lustre/scratch/mdiekhan/cactus/debug/run/jobStore).
real 83m12.232s user 0m47.406s sys 2m12.397s
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/DataBiosphere/toil/issues/3089
file system is luster
with the state splatted on the file system, I doubt if it is worth trying to speed it up.
maybe just change the message to say: This may take a couple of hours.
Adam Novak [email protected] writes:
That's a rate of 1 job processed every 300 ms, which seems pretty fast if there's any networking involved. How fast is this filesystem?
We might need to move over to some kind of async IO here to up throughput when there's latency involved in IO requests.
Probably we want to say it will take a certain amount of time per job, or "a while".
On 6/19/20, Mark Diekhans [email protected] wrote:
file system is luster
with the state splatted on the file system, I doubt if it is worth trying to speed it up.
maybe just change the message to say: This may take a couple of hours.
Adam Novak [email protected] writes:
That's a rate of 1 job processed every 300 ms, which seems pretty fast if there's any networking involved. How fast is this filesystem?
We might need to move over to some kind of async IO here to up throughput when there's latency involved in IO requests.
-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/DataBiosphere/toil/issues/3089#issuecomment-646844149