puppet-aem icon indicating copy to clipboard operation
puppet-aem copied to clipboard

Discussion : Wait for installer state to be ok before continuing.

Open zipkid opened this issue 6 years ago • 6 comments

We often see the crx installer fail because AEM is restarting or in any other way not able to handle the necessary queries/commands. This adds a 'wait for ok to install state' in the crx installer provider. This type of 'wait' may also be needed in the other providers but possibly without/with another check than the 'Sling+OSGi+Installer.json' . This code is certainly not good to be merged but we would like to discuss where/how this could be done to ensure clean puppet runs.

Maybe this should be part of https://github.com/bstopp/crx-packmgr-api-client-gem, but that is generated from https://github.com/bstopp/swagger-aem, which i don't know how to work with.

zipkid avatar Feb 09 '18 08:02 zipkid

This might fix #82 as well.

wimsymons avatar Feb 09 '18 08:02 wimsymons

@bstopp , I have updated the .rubocop.yaml -> 'TargetRubyVersion: 2.2'. Can you trigger the checks please?

zipkid avatar Feb 09 '18 09:02 zipkid

Do you have a use case or manifest set that shows this occurring? What is causing the AEM restart, a puppet change or a user initiated change?

What is being experienced right now? A number of subsequent failures?

I am pretty certain i know what the issue is, and this won't solve it; the system already does a check with retries here when the resource is encountered by Puppet for applying.

I was pretty sure i opened a ticket somewhere on the underlying issue; if i find it, i'll link it.

bstopp avatar Feb 10 '18 01:02 bstopp

Hi @bstopp,

An exact example of what we have observed is the following:

In our setup we have a clean AEM 6.3 installation, followed by a Service Pack 1 and Cumulative Fix Pack 2 package installation. When we do a clean install, we have observed that the CFP is often (but not always) only partially installed. When going to the package manager, a substantial number of the sub-packages are still in an uninstalled state. When reproducing the issue on a local workstation, I observed that one of the package install hooks of one of the CFP sub-packages threw an exception. The exception said that the Dynamic Class Loader service was no longer available. When investigating further, it turned out that the installation of the CFP package started too soon. When the Service Pack gets installed, and the package manager API returns, then the package manager GUI will show the package to be installed, but it is actually still in progress. This means that there are still a lot of OSGi services that are being reloaded due to the ongoing installation(s), when the next package installation is already started.

To try and make the package installations more robust, we are trying to add a more reliable check on the installation state. This check is based on the Sling OSGi Installer JMX MBean which is mentioned in the following AEM Gem:

https://docs.adobe.com/content/ddc/en/gems/AEM-Sustenance---Best-Practices-for-deploying-AEM-Maintenance-Releases/_jcr_content/par/download/file.res/AEM-Sustenance-Best-Practices-Gems.pdf

In the mean while I have also learned that the following end-point provides similar information, though it is documented nowhere, and googling for it seems to return no Adobe search hits at all.

/crx/packmgr/installstatus.jsp

I've decompiled the code, and it does a very simple check on the ActiveResourceCount attribute of the Sling OSGi Installer JMX MBean being '0' or not.

I hope this clarifies the necessity for these changes, if not I can provide more info.

Fyi, there are still a issues to tackle or think about.

  • A package installation does not mean there will be a single installation run. This means that when you observe an ActiveResourceCount=0, another run might still start. Certainly when installing a Service Pack I observed ActiveResourceCount=0 several times before the installation was completely done. So a few successful calls are probably needed before deciding the system is done (maybe also checking that the other attributes are no longer changing).
  • When the installation has failed for some reason, and some of the (new?) bundles are not starting, then the ActiveResourceCount remains > 0. So no other package installations will be possible until this gets resolved. No sure if this will always be desired..

stevengssns avatar Feb 12 '18 08:02 stevengssns

@bstopp any input on this?

henrykuijpers avatar Aug 22 '19 11:08 henrykuijpers

Can you confirm this wasn't fixed with v3.0.0?

bstopp avatar Sep 04 '19 15:09 bstopp