ecosystem-dashboard icon indicating copy to clipboard operation
ecosystem-dashboard copied to clipboard

Anecdotally slower performance

Open BigLep opened this issue 3 years ago • 21 comments

Hi @andrew ,

Not blocking, but just passing on that over the last couple of months have found the ecosystem dashboard has gotten anecdotally slower. Links are taking longer to load (to the point that I open multiple links in parallel to avoid future waits). Similarly, some of the .json URLs I used to hit that would resolve within 30 seconds now don't complete in time before the apparent application timeout. I have worked around this by reducing page size of my requests.

Steve

BigLep avatar Jun 03 '22 22:06 BigLep

I've got a new instance deployed here that feels much snappier: http://ipfs2.ecosystem-dashboard.com

It's not 100% ready to do the switch over, but feel free to have a click around.

andrew avatar Jun 10 '22 09:06 andrew

Agreed that it feels a lot snappier:

<1s : https://ipfs2.ecosystem-dashboard.com/all?org=libp2p&exclude_language%5B%5D=JavaScript&exclude_language%5B%5D=TypeScript&exclude_language%5B%5D=Rust&exclude_language%5B%5D=C%2B%2B&exclude_language%5B%5D=Kotlin&exclude_repo_full_name%5B%5D=libp2p%2Fhydra-booster&exclude_repo_full_name%5B%5D=libp2p%2Fgo-libp2p-kad-dht&exclude_repo_full_name%5B%5D=libp2p%2Fgo-libp2p-pubsub&exclude_repo_full_name%5B%5D=libp2p%2Fgo-libp2p-pubsub-router&exclude_repo_full_name%5B%5D=libp2p%2Fgo-libp2p-pubsub-tracer&range=365&state=open&per_page=100&sort=updated_at&order=desc&label%5B%5D=need%2Fauthor-input

many seconds: https://ecosystem-research.herokuapp.com/all?org=libp2p&exclude_language%5B%5D=JavaScript&exclude_language%5B%5D=TypeScript&exclude_language%5B%5D=Rust&exclude_language%5B%5D=C%2B%2B&exclude_language%5B%5D=Kotlin&exclude_repo_full_name%5B%5D=libp2p%2Fhydra-booster&exclude_repo_full_name%5B%5D=libp2p%2Fgo-libp2p-kad-dht&exclude_repo_full_name%5B%5D=libp2p%2Fgo-libp2p-pubsub&exclude_repo_full_name%5B%5D=libp2p%2Fgo-libp2p-pubsub-router&exclude_repo_full_name%5B%5D=libp2p%2Fgo-libp2p-pubsub-tracer&range=365&state=open&per_page=100&sort=updated_at&order=desc&label%5B%5D=need%2Fauthor-input

Anyways - no rush. We'll make do either way. Thanks!

BigLep avatar Jun 10 '22 17:06 BigLep

Everything is set up now on ipfs2, it should be keeping in sync with changes on github, perhaps you can try it out in your next triage session?

andrew avatar Jun 11 '22 07:06 andrew

Currently making some database config tweaks, ipfs2 will be unavailablbe for a couple hours

andrew avatar Jun 13 '22 11:06 andrew

Hi @andrew - just checking in here on what you advise I do for triage sessions going forward. I was going to flip things to ipfs2, but it doesn't look to be up.

BigLep avatar Jun 17 '22 17:06 BigLep

Yeah it looks like everything got really slow for a while like the server went to sleep almost, I will investigate

andrew avatar Jun 23 '22 16:06 andrew

Even on this new server the database is totally overwhelmed! I've restarted it and things are working again but it's going to need some more tweaks to make sure it doesn't fall over again, I have a full day on Monday that I can work on it.

andrew avatar Jun 24 '22 11:06 andrew

I'm running some background cleanup scripts on all the instances to remove a lot of unused database records, it may take a few hours and the dbs will be a bit slow but my hope is to reduce the database size significantly and unlock some more performance without any code changes.

andrew avatar Jun 27 '22 10:06 andrew

Before running cleanup: Screenshot 2022-06-27 at 11 13 25

The events table and it's indexes have grown very large and consume a lot of resources, the repository dependencies table is also very large and has a lot of indexes.

andrew avatar Jun 27 '22 10:06 andrew

Cleanup is complete and I've also made a number of significant performance improvements across various parts of the app that should reduce database load on ipfs.ecosystem-dashboard.com (https://ecosystem-research.herokuapp.com) and I'll be monitoring it closely over the next week.

Ignore ipfs2.ecosystem-dashboard.com for now

andrew avatar Jun 30 '22 12:06 andrew

@andrew : in case it wasn't known, I can't get the dashboard to load for me today (2022-07-14). I've tried multiple URLs. I'm planning to sing its praises during an IPFS Thing talk tomorrow (2022-07-15). I'm hopeful it will be up in case anyone in the audience checks it out.

Edit: I'm able to get some URLs to load now.

BigLep avatar Jul 14 '22 15:07 BigLep

There was a change earlier in the week to the pmf stats that has put a big load on the database, will see if I can tweak some things later tonight

andrew avatar Jul 14 '22 17:07 andrew

@BigLep I have killed all the db connections and restarted everything, I think the next course of action will be to seperate the pmf stats from the issue triage as the database can't handle doing both in one app.

andrew avatar Jul 14 '22 17:07 andrew

Thanks @andrew for the update. Just passing on that for triages this week we have been getting "Application error" for all URLs.

BigLep avatar Jul 20 '22 15:07 BigLep

I was also getting application error a lot and almost opened a second issue, but things just recently started working, and much more quickly.

side-note: Since I have access to the heroku instance, I was trying to gather logs to determine the issue, but it was not quick/simple for me to do so. Neither of the following commands gave me any more information about what was causing the errors:

  • heroku logs --tail -a ecosystem-research | grep "503"
  • heroku logs --tail -a ecosystem-research | grep "Application Error"

Any tips you (@andrew) have on troubleshooting would be great =D

SgtPooki avatar Jul 20 '22 15:07 SgtPooki

I gave the whole thing a big kick about 10 mins after seeing @BigLep's comment, and by big kick I mean:

heroku pg:killall

followed by

heroku restart

The problem is that there are some overnight background tasks that are completely stomping the database and it's not recovering, killing all the very long running db connections is a blunt object way of bringing the web app back online.

@SgtPooki heroku logs don't help much as it can be hard to see what is causing the timeouts, I've added newrelic as an addon that has much more info on slow actions, db queryies extra.

You should be able to find the "new relic apm" link on this page: https://dashboard.heroku.com/apps/ecosystem-research/resources (the "heroku postgres" link on that page also has some basic insights that might be helpful)

I'm going to do some more investigation tomorrow morning, haven't had a lot of free time available to keep on top of this recently as my other job has been pretty full on recently.

andrew avatar Jul 20 '22 17:07 andrew

Yesterday I made some significant changes to the pmf calculations which should reduce the load on the database and keep the web ui performant.

andrew avatar Jul 22 '22 10:07 andrew

@andrew : I'm getting queries that are timing out again. I'm trying to pull down event data, and even reducing the page size to 100 is still leading to timed out results: https://ipfs.ecosystem-dashboard.com/events.json?range=144&per_page=100&page=1

Does it need to be "kicked" again?

BigLep avatar Sep 22 '22 23:09 BigLep

The events table has grown very, very large and query time has reached over the 30 second heroku timeout limit. You can get the endpoint to load by removing the range paramter but that may not help in your case.

What I'm thinking we may need to do is move older events (say over 1 year old) into a seperate table (archived_events for example), to keep all the website endpoint performant.

andrew avatar Sep 23 '22 14:09 andrew

Got it - makes sense. Moving events over a year old definitely seems good/fine to me. In the last 1.5 years, I haven't needed to go back further than a year.

BigLep avatar Sep 23 '22 14:09 BigLep

I'm going on holiday tomorrow, so won't get chance to split the events table for a couple weeks.

andrew avatar Oct 05 '22 10:10 andrew