coveralls-public 4+ minutes to report coverage status to GitHub

Recently the service becomes slow to accept jobs and call-back the status to github.

May 20 '21 15:05 fogfish

I have also been seeing long times (30min +) for coverage to fully filter into Coveralls itself from Github builds.

May 20 '21 16:05 jasonstitt

To oversimplify, build times include the run time for background jobs that complete your builds by performing calculations and rendering your source file directories, and there are three (3) factors that affect performance there:

Number of source files in your project - We route projects with greater numbers of source files to separate queues
Traffic - Other projects running at the same time in the same queues / on the same resources
General resources / need to scale - We upgraded our infrastructure recently and haven't seen metrics to indicate general performance issues recently.

Today, at 3pm PST on a Fri, there's not a single job waiting in queue, so performance should be good. Your build times are likely to be true to the nature of your individual project.

If you'd like to email the Coveralls URL for your project to [email protected], and reference this issue, I'll get you a report of your last 100 build times and we can see what the pattern is.

May 21 '21 22:05 afinetooth

Thank you! I'll send details about the project to support email address. In my case, I've experience this with very small Golang project.

Just want to say that you've made a really great product!

May 26 '21 18:05 fogfish

As a follow up... GitHub Action reports the job is completed https://coveralls.io/jobs/80911480 Opening the job link, the UI fails with an error message.

Most likely this is a case what you've explained in your message. Job is queued and no data is available yet to render.

May 26 '21 19:05 fogfish

Hi @fogfish , yes, that's right. That error indicates that the job that renders the TREE view of your project in the SOURCE FILES table has not completed.

We have been experiencing high traffic volume for the past 36 hours, so I'm sure this is the cause of your slow build. The bad news is this is hard to avoid with shared system resources. The good news is that it will resolve itself in time.

I got your email and will share some recent build times with you so you can get a sense of what is average and what is slow for your project.

May 26 '21 20:05 afinetooth

We have consistently experienced lengthy delays in coverage reports. Every once in a while we get 504's too. Because of this we are considering just running coveralls on our main branch. Some engineers were even thinking of removing coveralls alltogether. It would be really nice if the performance could be improved somehow (easier said than done, I know).

Jul 20 '22 15:07 caleb15

Hi, @caleb15. Indeed, thank you.

FWIW, I think your team was experiencing the effects of this incident two days ago.

I would encourage you guys to visit our status page when you're experiencing issues since we will publish the ones we're addressing and will give expectations and timeframes to the best of our abilities there: https://status.coveralls.io/incidents/r4wv06xnv3d6

You can also subscribe to updates there.

The last known incident previous to that one was Jun 29-30 where we had to ultimately perform some unplanned overnight maintenance to sort things out, so that was disruptive and I apologize for that.

If your team experiences a slow down or significant issue and you don't find info about it on the status page, though, and they don't mind doing it, please shoot us an email at [email protected] since we may not know about it yet and you'll be helping the entire user base.

Jul 22 '22 22:07 afinetooth

@afinetooth I want to emphasize how consistent these issues are. We always have slowness in reporting, not just when there's an incident. For example our last three PR's, all opened more than a hour ago, still don't have a coveralls report. In some other PR's I checked, coveralls took roughly an hour to report.

Example sha: b423ff3c1c9151c984fd3ee8add4bfb36276a895 (BB-4072: added cycle lock/unlock performance test) https://coveralls.io/builds/51136024

Jul 25 '22 19:07 caleb15

Hi @caleb15. I'm attaching a report of your last 1,000 builds for the repo in question:

last_1000_builds_jul-26-2022.csv

Here are the details I've noticed and made note of.

In summary, they point to an issue and warrant further investigation, which I've asked for and will need to follow until I can provide you with further insight.

In the meantime, the details:

Normal build times from Jul 11-18.
Build times become bad / unacceptable (beyond target max build time during high traffic (30-min)) from Jul 18-21.
Build times start improving by 50%+ by end of day Jul 21.
Build times return to slow but acceptable (<30-min) by morning of Jul 25.
Build times become bad / unacceptable again by afternoon of Jul 25.
Your "bad" build times are outliers and considered unacceptable (beyond target max build time under high traffic (30-min)), and, in fact, are 2-10x multiples of that on numerous occasions, indicating a problem with either the repo or infrastructure, and warranting further investigation.
These times are so far outside of average readings for the metrics we use to gauge build times (average job dequeue time) that it's not something we're picking up from mass-use statistics.
We have not seen these build times since earlier this year, prior to resolving them with a major infrastructure upgrade. FWIW, we had another major infrastructure upgrade roughly three-weeks ago that further improved general performance, in terms of build times. (It would be interesting to see how your build times were around those times.)
You do not have a "large" project (5k+ files), which would be routed to dedicated servers, but you do have two (2) parallel builds, resulting in processing >5K files. This doubles your time in queue on shared servers.
We consider your use of the service to be "high-use." With 60-100 jobs per day, you have a significantly higher number of jobs / day than average, making your repo, to an extent, its own competition in terms of running on shared resources. With a commit roughly every 3-minutes, it's almost always the case that several of your jobs are waiting for a number of your previous jobs to finish.
We will need to compare your build times to average build times to understand if the slow build times are part of a general pattern, or if they diverge.
You may be a good candidate for a new plan tier with dedicated processing resources that we hope to pilot in the next 2-3 months. The tier will carry additional cost, but that will be in exchange for dedicated servers that only process your jobs and allow you to tune your build times to your team's workflows. (I think we've discussed this before and I think you are already on our pilot list, but if you were not, you are now.)

Jul 26 '22 19:07 afinetooth

$ find . -name node_modules -prune -o -name '*.py' -and -not -path './sandbox/*' -and -not -path './squads_sandbox/*' -print | wc -l 
4962

.... so close :sweat_smile:

Thanks for the very thorough investigation, I appreciate it. I'll let our engineers know.

(I think we've discussed this before and I think you are already on our pilot list, but if you were not, you are now.)

I was not on the pilot list, first time I've head about that :eyes: .

Jul 26 '22 23:07 caleb15

Darn! Lol.

Re: pilot list. Sorry about that; you're on it now.

Jul 27 '22 20:07 afinetooth

@caleb15 we made some infrastructure changes that may have resolved the issues behind your long build times.

I'm attaching your last 500 build times, which, as you'll see, drop off after AUG 1 (after our changes) and never get back above 13-min, with the vast majority coming in under 7-min.

last_500_builds_aug-4-2022.csv

Let us know if this does not continue.

Aug 04 '22 23:08 afinetooth

coveralls-public coveralls-public copied to clipboard

4+ minutes to report coverage status to GitHub

coveralls-public
coveralls-public copied to clipboard