parabol icon indicating copy to clipboard operation
parabol copied to clipboard

Speed up repoIntegrations query

Open mattkrick opened this issue 2 years ago • 7 comments

I want to get #5883 merged, but we need to fix the logic soon to keep the app feeling snappy. This is high priority because #5883 will introduce a much longer wait to open the menu. For users who have 100+ jira projects, the wait could now be in the seconds, which isn't good if we're trying to increase integration adoption.

The current logic runs a map/reduce on tasks for that team member to see which integrations they have used in the past. Additionally, it fetches every project that the user has access to. That requires multiple roundtrips to jira because we have to paginate the jira projects.

To fix, we have options:

  • Cache a list of used repos in redis
  • Cache a list of fetched repos in redis (fetched from all the integrations)
  • Don't fetch 3rd party integration repos until they start searching (i don't like this option)

Feel free to fight me on this, but I propose the following logic:

  • [ ] repoIntegrations checks to see if repoIntegration:${teamId}:${userId} exists in redis. if it does, return that
  • [ ] if it doesn't exist, then it runs what it currently does today & caches the result in redis
  • [ ] to invalidate: when a task is pushed to an integration, the mutation checks to see if that integration repo exists in the redis key above. if redisResult.find((repo) => repo.id === id)?.lastUpdatedAt > 0 don't invalidate. else, invalidate the redis cached value
  • [ ] to invalidate in the case that a repo was recently added in github, I propose the following. When they search for a repo by typing a search query, we filter locally (just like today). If they don't click anything within 5 second, we call repoIntegrations(force: true), which blows away the cache & fetches from integrations again.

mattkrick avatar Feb 05 '22 01:02 mattkrick

@mattkrick I'd love to get your thoughts on this idea. Users will usually want to select the repos/projects that they've previously integrated with. That query is cheap, but fetching all possible repos/projects is expensive.

Proposal:

  • split repoIntegrations into prevRepoIntegrations and allAvailableRepoIntegrations
  • in the task integration dropdown, query prevRepoIntegration and allAvailableRepoIntegrations but use the @defer directive with the allAvailableRepoIntegrations query
  • show the prevRepoIntegration results in the menu dropdown and a loading state while we're waiting for the allAvailableRepoIntegrations results

With this approach, we could still have a snappy menu without introducing Redis and worrying about stale results. The downsides are users would see a loading state, and if the previous query is still valid, the Redis cache would be a lot faster.

Another approach could be using the @defer proposal above and caching the allAvailableRepoIntegrations results in Redis. This seems like the most complex but snappiest approach.

EDIT: @defer won't work for now as we need to update the server to be able to handle this directive.

nickoferrall avatar Feb 11 '22 13:02 nickoferrall

General feedback while it's fresh:

Gee, it'd love to see whatever we decided and implement expressed as an interface for a developer to fill out for future integrations. Could be real tricky to just "know" that you should worry about caching this stuff for future integrations

jordanh avatar Feb 15 '22 19:02 jordanh

Now that we're storing the favoriteService in https://github.com/ParabolInc/parabol/pull/6331, we could sort the favoriteService integrations above other services. favoriteService could be a nullable argument passed into the repoIntegrations query.

nickoferrall avatar Apr 04 '22 10:04 nickoferrall

related: #6723

  • i love the interface
  • i love the favoriteService (i can't believe i named it that though 😞)
  • i love using defer, but that might take awhile to make the server support it
  • i'd like to get rid of allAvailableRepoIntegrations, we should have just 1 field, but multiple args

Proposed AC

  • [ ] initial load should fetch previously used repos, sort by favoriteService on the client side. if it returns 0, so be it.
  • [ ] subsequent load is triggered by user behavior (a scroll to the bottom - 50px, a search query typed, or the previous query yielding 0 results)
  • [ ] subsequent load will fetch all projects for all integrations

the 2 most common use cases will be:

  • 0 previous results
  • 1-2 previous results

the case that doesn't work for people today is:

  • 1 previous result, looking to add a 3rd

do we just add a "load more" button at the bottom of the menu for the problem case? alternative-- we don't auto-focus the search input & when they move into it we fetch all the repos so the search is done locally

mattkrick avatar Jun 13 '22 19:06 mattkrick

I like the proposed AC! Two edits I'd like to suggest:

  1. subsequent load is triggered by user behavior. I don't think the user should need to scroll down or type a query to see all of their results. My concern is they won't realise they need to do this, so they'll think their prev integrations are their only option.

    I like the "load more" button, but I prefer your final suggestion where we fetch all repos whenever the task footer menu is opened, as it saves the user a click. I'd like to show the prev repos initially. If there are less than 10, show a loading ellipsis, and send the subsequent query to get all repos.

  1. If the first ten results in the menu belong to the user's favourite service, they might not realise that they can search for more than that service.

    Instead, we could show the prev used integrations at the top of the list, and then alternate between their different integrations until we reach ten in the list? So if there are 5 prev integrations, the remaining 5 in the list would be [github, jira, gitlab, github, jira], for example.

    When querying the list, however, we'd show all the matching favourite service results first, as they already know they can search all integrations at that point.

Proposed AC:

  • [ ] initial load should fetch previously used repos, sort by favoriteService on the client side. if it returns 0, so be it.
  • [ ] if there are less than 10 previously integrated repos, show a loading ellipsis below and send a subsequent load requesting all repo integrations
  • [ ] if there are 10 or more prev integrated repos, don't send the subsequent query until the user has begun searching
  • [ ] in the first 10 items in the menu, show prev integrated repos at the top and then alternate between the repo integrations until we reach 10
  • [ ] when a search query is used, show all of the favourite service issues at the top of the results

nickoferrall avatar Jun 23 '22 16:06 nickoferrall

Fetching all possible repos from all integrations is really expensive. How can we avoid doing that in the use case that they just want to push it to their most recently used integration?

mattkrick avatar Jun 24 '22 19:06 mattkrick

Current plan of action:

  • add prevRepoIntegrations query to TeamMember
  • initially query prevRepoIntegrations in the task footer menu
  • include a Load More button in the menu
  • lazily query repoIntegrations only if prevRepoIntegrations is an empty array or the user has clicked load more
  • as the user clicks the Load More button, more results are added
  • remove allAvailableRepoIntegrations

nickoferrall avatar Sep 21 '22 18:09 nickoferrall