ember-cli-deploy-s3-index icon indicating copy to clipboard operation
ember-cli-deploy-s3-index copied to clipboard

Activate task becomes very slow when you have a lot of deployed items

Open vitch opened this issue 3 years ago • 6 comments

We've been happily using ember-cli-deploy and this plugin for over 6 years - thanks!

As a consequence of this, we've built up a large history of previously deployed versions of our apps (which are available via special URLs for testing and other purposes).

I noticed that our activation stage is taking a long time (we have a bunch of apps across a bunch of environments with a bunch of historical versions) and digging into it I found that a lot of this is due to this plugin needing to list the revisions available on S3 for various reasons.

In #119 I optimised some of the easy cases where it was possible to avoid some of this communication with S3 but we're still left with fetchInitialRevisions and fetchRevisions.

I was trying to figure out why these functions exist/ are called and whether we could optimise them in some way.

I found some relevant discussions:

  • https://github.com/ember-cli-deploy/ember-cli-deploy/pull/209
  • https://github.com/ember-cli-deploy/ember-cli-deploy/pull/323#issuecomment-164784993

From these, I wasn't too clear exactly how important these hooks might be. If they impose a high cost on some users (those with full buckets on S3) then could we change their behaviour or somehow speed them up?

It seems like the prime use-case for fetchInitialRevisions is to generate a changelog and indeed we have code which adds an entry to an audit log with the previously active revision and the new one. If other plugins only need to know the previously active revision then we can optimise by at least bailing out of the listObjectRecursively loop once we've found the active revision.

I'm interested in the appetite here for changes around this. If you think this is a problem that most people don't suffer from then we can fork the plugin and change the behaviour for our purposes. But if you think it's something that this plugin should handle then I'd love to talk about possible approaches and exactly what we need from fetchInitialRevisions and fetchRevisions

vitch avatar Jul 04 '22 15:07 vitch

@vitch I understand the issue and it seems from our discussions at the time that we had some concerns about this eventuality but decided the benefits outweighed the costs. The contract of the hooks is that they will fetch all the revisions so I am skeptical that general solutions that don't do this would be guaranteed to be backward compatible user's deployment configurations.

As a low-tech solution, have you considered deleting or archiving your 59,000 oldest revisions?

lukemelia avatar Jul 04 '22 16:07 lukemelia

Yeah - I meant to mention that we'd considered that. The same folders are used to power our special historical URLs and I wanted to avoid adding special casing there...

I agree that it would be hard to do this in a backward compatible way which is why I split it out from #119.

I guess one other option would be to have a config flag to allow people to opt in to a new behaviour (where fetchRevisions returned nothing and fetchInitialRevisions returned only the currently active revision) if they were running into performance issues.

I'm essentially going to build that out anyway and test it in our environment to see if there are any unforeseen consequences. I can put up a PR for it and we can discuss there - if you don't think it's a good match for this plugin then I'm happy to maintain a forked version for our needs...

If you have any concrete examples of fetchRevisions or fetchInitialRevisions which need access to more than I describe above then let me know and it might help me to come up with a flexible solution...

vitch avatar Jul 04 '22 16:07 vitch

I don't see any harm in having a config flag for this plugin that allows you to opt into the behavior you're describing. Then you wouldn't have the hassle of maintaining a fork.

lukemelia avatar Jul 04 '22 17:07 lukemelia

Cool. Any preferences on a name? optimisedFetchRevisions? fetchOnlyRelevantRevisions? speedOverAccuracy?

(naming things is hard - those are more in the vein of brainstorming than actual suggestions)

vitch avatar Jul 04 '22 19:07 vitch

In fact, it looks like fetchRevisions is used by deploy:list as well - I guess we don't want to break it there...

Looking at the hooks documentation and re-reading the threads above, I'm still not too clear on why fetchRevisions is called after activate. Can you clarify the use-case for me?

(it looks like I can get another ~20% speed up by changing fetchInitialRevisions to only loop over s3 until it finds the active revision but that percentage will change depending on where the active revision lies and I will be able to get a much bigger speedup by noop-ing the fetchRevisions call but I can't figure out how to do that safely)

vitch avatar Jul 04 '22 20:07 vitch

Another possible approach is to cache the result of fetchInitialRevisions and manually update it post-activation rather than looping over s3 again in fetchRevisions...

That (in combination with #119) means we're loading the list of all deployed versions once instead of three times when deploy:activate is called... I'd prefer an option for zero times but once would definitely be a step in the right direction ;)

vitch avatar Jul 04 '22 20:07 vitch