accesscontroltool icon indicating copy to clipboard operation
accesscontroltool copied to clipboard

Integration of the JCR Installer Install Hook in v 2.2.0 might have a race condition in combination with ReplicationReceiver

Open kwin opened this issue 6 years ago • 4 comments

#210 is causing regressions under specific circumstances. In the following deployment strategy we see errors while installing the package:

  1. Complete package containing actool package and other packages (some containing AEM osgi configs) is uploaded to the author and installed there via the Package Manager ReST API
  2. Packages being contained in the complete package are being replicated (asynchronously)

Now the package installation fails on publish with the following error message

07.03.2019 13:31:33.008 *INFO* [Apache Sling Resource Resolver Finalizer Thread] com.adobe.granite.repository Service [19643, [org.apache.jackrabbit.oak.api.jmx.SessionMBean]] ServiceEvent UNREGISTERING
07.03.2019 13:31:45.943 *INFO* [10.43.0.13 [1551965505938] POST /bin/receive HTTP/1.1] com.day.cq.replication.impl.content.durbo.DurboImporter Content size triggered creation of temp file: 2427842...
07.03.2019 13:31:45.958 *INFO* [10.43.0.13 [1551965505938] POST /bin/receive HTTP/1.1] com.day.cq.replication.impl.content.durbo.DurboImporter Temporary file created in 15ms (2427842 bytes)
07.03.2019 13:31:45.998 *INFO* [10.43.0.13 [1551965505938] POST /bin/receive HTTP/1.1] com.day.cq.replication.impl.content.durbo.DurboImporter imported content in 39ms for durbo request on path: /etc/packages/onemarketing/platform-aem-common.deploymentpackage-64.0.190307000.zip
07.03.2019 13:31:46.248 *INFO* [10.43.0.13 [1551965505938] POST /bin/receive HTTP/1.1] org.apache.jackrabbit.vault.packaging.impl.InstallHookProcessorImpl Loading Hook /META-INF/vault/hooks/org.apache.sling.installer.provider.installhook-1.0.4.jar: Main-Class = org.apache.sling.installer.provider.installhook.OsgiInstallerHook
07.03.2019 13:31:46.260 *INFO* [10.43.0.13 [1551965505938] POST /bin/receive HTTP/1.1] org.apache.jackrabbit.vault.packaging.impl.InstallHookProcessorImpl Hook /META-INF/vault/hooks/org.apache.sling.installer.provider.installhook-1.0.4.jar registered.
07.03.2019 13:31:46.324 *INFO* [10.43.0.13 [1551965505938] POST /bin/receive HTTP/1.1] org.apache.sling.installer.provider.installhook.OsgiInstallerHook OsgiInstallerHook is active in Netcentric:accesscontroltool-package:2.3.2
07.03.2019 13:31:46.328 *INFO* [10.43.0.13 [1551965505938] POST /bin/receive HTTP/1.1] org.apache.sling.installer.provider.installhook.OsgiInstallerHook Bundles in package [BundleInPackage [path=/apps/netcentric/actool/install/accesscontroltool-bundle-2.3.2.jar, symbolicId=biz.netcentric.cq.tools.accesscontroltool.bundle, version=2.3.2], BundleInPackage [path=/apps/netcentric/actool/install/snakeyaml-1.13.jar, symbolicId=org.yaml.snakeyaml, version=1.13.0]]
07.03.2019 13:31:46.329 *INFO* [10.43.0.13 [1551965505938] POST /bin/receive HTTP/1.1] org.apache.sling.installer.provider.installhook.OsgiInstallerHook Bundle biz.netcentric.cq.tools.accesscontroltool.bundle is installed with version 2.2.0 but package contains version 2.3.2
07.03.2019 13:31:46.329 *INFO* [10.43.0.13 [1551965505938] POST /bin/receive HTTP/1.1] org.apache.sling.installer.provider.installhook.OsgiInstallerHook Bundle biz.netcentric.cq.tools.accesscontroltool.bundle requires installation
07.03.2019 13:31:46.331 *INFO* [10.43.0.13 [1551965505938] POST /bin/receive HTTP/1.1] org.apache.sling.installer.provider.installhook.OsgiInstallerHook Bundle org.yaml.snakeyaml is already installed with version 1.13.0 that matches 1.13.0 as provided in package
07.03.2019 13:31:46.331 *INFO* [10.43.0.13 [1551965505938] POST /bin/receive HTTP/1.1] org.apache.sling.installer.provider.installhook.OsgiInstallerHook Config /apps/netcentric/actool/config/org.apache.sling.serviceusermapping.impl.ServiceUserMapperImpl.amended-ncActoolSystemUser.config is already installed
07.03.2019 13:31:46.332 *INFO* [10.43.0.13 [1551965505938] POST /bin/receive HTTP/1.1] org.apache.sling.installer.provider.installhook.OsgiInstallerHook Installing 1 bundles and 0 configs
07.03.2019 13:31:46.361 *INFO* [10.43.0.13 [1551965505938] POST /bin/receive HTTP/1.1] org.apache.jackrabbit.vault Service [19644, [org.apache.sling.installer.api.event.InstallationListener]] ServiceEvent REGISTERED
07.03.2019 13:31:46.361 *INFO* [10.43.0.13 [1551965505938] POST /bin/receive HTTP/1.1] org.apache.sling.installer.provider.installhook.OsgiInstallerHook Updating resources [InstallableResource, priority=2000, id=/apps/netcentric/actool/install/accesscontroltool-bundle-2.3.2.jar]
07.03.2019 13:31:46.365 *INFO* [10.43.0.13 [1551965505938] POST /bin/receive HTTP/1.1] org.apache.sling.installer.provider.installhook.OsgiInstallerHook Waiting for 1 bundles / 0 configs to be installed
07.03.2019 13:31:46.397 *INFO* [JcrInstaller.3] org.apache.sling.installer.provider.jcr.impl.JcrInstaller Registering resource with OSGi installer: [InstallableResource, priority=200, id=/apps/netcentric/actool/install/accesscontroltool-bundle-2.3.2.jar]
07.03.2019 13:31:46.397 *INFO* [JcrInstaller.3] org.apache.sling.installer.provider.jcr.impl.JcrInstaller Removing resource from OSGi installer: [/apps/netcentric/actool/install/accesscontroltool-bundle-2.2.0.jar]
07.03.2019 13:31:46.521 *INFO* [OsgiInstallerImpl] org.apache.sling.audit.osgi.installer Updated bundle biz.netcentric.cq.tools.accesscontroltool.bundle [627] from resource TaskResource(url=jcrinstall:/apps/netcentric/actool/install/accesscontroltool-bundle-2.3.2.jar, entity=bundle:biz.netcentric.cq.tools.accesscontroltool.bundle, state=INSTALL, attributes=[org.apache.sling.installer.api.tasks.ResourceTransformer=:28:25:, Bundle-SymbolicName=biz.netcentric.cq.tools.accesscontroltool.bundle, Bundle-Version=2.3.2], digest=1551965506268)
07.03.2019 13:31:46.522 *INFO* [OsgiInstallerImpl] org.apache.sling.audit.osgi.installer Refreshing 1 bundles: [biz.netcentric.cq.tools.accesscontroltool.bundle [627]]
07.03.2019 13:31:46.522 *INFO* [OsgiInstallerImpl] org.apache.sling.audit.osgi.installer Waiting up to 90 seconds for bundles refresh
07.03.2019 13:31:46.535 *INFO* [FelixFrameworkWiring] org.apache.sling.adapter Service [5487, [org.apache.sling.adapter.Adaption]] ServiceEvent UNREGISTERING

So the whole framework is being restarted due to the synchronous calling of the OSGi installer in the hook. The trigger for this is the RefreshBundlesTask calling FrameworkWiring.refreshBundles(<actoolbundle>, this);

It is unclear still why the whole framework is getting restarted though. The exact conditions are listed in https://github.com/apache/felix/blob/0bfe4ca7ebc6e81f0a9f4186a7ef58df4d92b4c9/framework/src/main/java/org/apache/felix/framework/Felix.java#L4445 (which is internally called from FrameworkWiring.refreshBundles(<actoolbundle>, this);)

But it is obvious from the Log that the JCR Installer and the Install Hook trigger the OSGi Installer for the resources more or less at the same time. That is due to the fact that package replication does not stop the JCR Installer.

kwin avatar Feb 19 '19 13:02 kwin

@ghenzler Since obviously the package manager cannot deal with certain concurrent service restarts there must be an easy toggle which allows to disable the install hook in the AC Tool. Maybe we must provide two different packages...

kwin avatar Feb 19 '19 13:02 kwin

  1. All packages being contained in the complete package are being replicated (asynchronously)

doing this in parallel sounds unreasonable... why are packages in this setup installed in parallel? Why is it not just the complete package itself that is replicated?

ghenzler avatar Feb 19 '19 17:02 ghenzler

Package replications either triggered via the Package Manager Web UI or the ReSTful service are always asynchronous (like regular page replication). Also we cannot prevent that two people trigger a package replication at the same time.

kwin avatar Feb 19 '19 18:02 kwin

Hi all,

not sure if this is directly related to this issue but the log file entries are similar.

We discovered the issue on some of our AEM instances that the Publisher is not reachable any more, after updating the ACL tool from an earlier version. This happens with an Upgrade to 2.2.0 and again from 2.2.0 with 2.3.2. This was our workaround on the publisher instance:

  1. Remove bundle biz.netcentric.cq.tools.accesscontroltool.bundle from system/console/bundles
  2. Remove node /apps/netcentric/actool/install/accesscontroltool-bundle-2.3.2.jar in CRX/DE
  3. Reinstall deploymentpackage

So only a restart of the publisher, the deinstallation of the older version and reinstallation of the newer version via Package manager was successful. As I said the issue is only reproducible on some instances.

The error log does not give us much information, but fits to the observations by @kwin :

26.02.2019 16:02:27.295 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle Service [biz.netcentric.cq.tools.actool.configuploadlistener.impl.UploadListenerServiceImpl,7370, [biz.netcentric.cq.tools.actool.configuploadlistener.UploadListenerService, org.osgi.service.event.EventHandler]] ServiceEvent UNREGISTERING
26.02.2019 16:02:27.298 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle Service [biz.netcentric.cq.tools.actool.installhook.impl.AcToolInstallHookServiceImpl,7371, [biz.netcentric.cq.tools.actool.installhook.impl.AcToolInstallHookService]] ServiceEvent UNREGISTERING
26.02.2019 16:02:27.300 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle Service [biz.netcentric.cq.tools.actool.jmx.AceServiceMBeanImpl,7372, [biz.netcentric.cq.tools.actool.jmx.AceServiceMBean]] ServiceEvent UNREGISTERING
26.02.2019 16:02:27.306 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle Service [biz.netcentric.cq.tools.actool.impl.AcInstallationServiceImpl,7369, [biz.netcentric.cq.tools.actool.api.AcInstallationService, biz.netcentric.cq.tools.actool.impl.AcInstallationServiceInternal, biz.netcentric.cq.tools.actool.aceservice.AceService]] ServiceEvent UNREGISTERING
26.02.2019 16:02:27.307 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle Service [biz.netcentric.cq.tools.actool.aceinstaller.AceBeanInstallerClassic,7354, [biz.netcentric.cq.tools.actool.aceinstaller.AceBeanInstaller]] ServiceEvent UNREGISTERING
26.02.2019 16:02:27.309 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle Service [biz.netcentric.cq.tools.actool.aceinstaller.AceBeanInstallerIncremental,7355, [biz.netcentric.cq.tools.actool.aceinstaller.AceBeanInstaller]] ServiceEvent UNREGISTERING
26.02.2019 16:02:27.330 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle Service [biz.netcentric.cq.tools.actool.authorizableinstaller.impl.AuthorizableInstallerServiceImpl,7356, [biz.netcentric.cq.tools.actool.authorizableinstaller.AuthorizableInstallerService]] ServiceEvent UNREGISTERING
26.02.2019 16:02:27.338 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle Service [biz.netcentric.cq.tools.actool.authorizableinstaller.impl.ExternalGroupInstallerServiceImpl,7357, [biz.netcentric.cq.tools.actool.authorizableinstaller.impl.ExternalGroupInstallerServiceImpl]] ServiceEvent UNREGISTERING
26.02.2019 16:02:27.340 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle Service [biz.netcentric.cq.tools.actool.configreader.ConfigFilesRetrieverImpl,7358, [biz.netcentric.cq.tools.actool.configreader.ConfigFilesRetriever]] ServiceEvent UNREGISTERING
26.02.2019 16:02:27.346 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle Service [biz.netcentric.cq.tools.actool.configreader.YamlConfigurationMerger,7368, [biz.netcentric.cq.tools.actool.configreader.ConfigurationMerger]] ServiceEvent UNREGISTERING
26.02.2019 16:02:27.347 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle Service [biz.netcentric.cq.tools.actool.configreader.TestUserConfigsCreator,7359, [biz.netcentric.cq.tools.actool.configreader.TestUserConfigsCreator]] ServiceEvent UNREGISTERING
26.02.2019 16:02:27.373 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle Service [biz.netcentric.cq.tools.actool.configreader.VirtualGroupProcessor,7360, [biz.netcentric.cq.tools.actool.configreader.VirtualGroupProcessor]] ServiceEvent UNREGISTERING
26.02.2019 16:02:27.375 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle Service [biz.netcentric.cq.tools.actool.configreader.YamlConfigReader,7361, [biz.netcentric.cq.tools.actool.configreader.ConfigReader]] ServiceEvent UNREGISTERING
26.02.2019 16:02:27.380 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle Service [biz.netcentric.cq.tools.actool.configreader.YamlMacroProcessorImpl,7363, [biz.netcentric.cq.tools.actool.configreader.YamlMacroProcessor]] ServiceEvent UNREGISTERING
26.02.2019 16:02:27.381 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle Service [biz.netcentric.cq.tools.actool.configreader.YamlMacroChildNodeObjectsProviderImpl,7362, [biz.netcentric.cq.tools.actool.configreader.YamlMacroChildNodeObjectsProvider]] ServiceEvent UNREGISTERING
26.02.2019 16:02:27.400 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle Service [biz.netcentric.cq.tools.actool.dumpservice.impl.DumpServiceImpl,7364, [biz.netcentric.cq.tools.actool.dumpservice.ConfigDumpService]] ServiceEvent UNREGISTERING
26.02.2019 16:02:27.403 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle Service [biz.netcentric.cq.tools.actool.healthcheck.LastRunSuccessHealthCheck,7366, [org.apache.sling.hc.api.HealthCheck]] ServiceEvent UNREGISTERING
26.02.2019 16:02:27.406 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle Service [biz.netcentric.cq.tools.actool.history.impl.AcHistoryServiceImpl,7365, [biz.netcentric.cq.tools.actool.history.AcHistoryService]] ServiceEvent UNREGISTERING
26.02.2019 16:02:27.410 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle Service [biz.netcentric.cq.tools.actool.validators.impl.ObsoleteAuthorizablesValidatorImpl,7367, [biz.netcentric.cq.tools.actool.validators.ObsoleteAuthorizablesValidator]] ServiceEvent UNREGISTERING
26.02.2019 16:02:27.426 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle BundleEvent STOPPING
26.02.2019 16:02:27.426 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle BundleEvent STOPPED
26.02.2019 16:02:27.447 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle BundleEvent UNRESOLVED
26.02.2019 16:02:27.447 *INFO* [OsgiInstallerImpl] biz.netcentric.cq.tools.accesscontroltool.bundle BundleEvent UPDATED
26.02.2019 16:02:27.448 *INFO* [OsgiInstallerImpl] org.apache.sling.audit.osgi.installer Updated bundle biz.netcentric.cq.tools.accesscontroltool.bundle [616] from resource TaskResource(url=jcrinstall:/apps/netcentric/actool/install/accesscontroltool-bundle-2.3.2.jar, entity=bundle:biz.netcentric.cq.tools.accesscontroltool.bundle, state=INSTALL, attributes=[org.apache.sling.installer.api.tasks.ResourceTransformer=:28:25:, Bundle-SymbolicName=biz.netcentric.cq.tools.accesscontroltool.bundle, Bundle-Version=2.3.2], digest=1551196946590)
26.02.2019 16:02:27.449 *INFO* [OsgiInstallerImpl] org.apache.sling.audit.osgi.installer Refreshing 1 bundles: [biz.netcentric.cq.tools.accesscontroltool.bundle [616]]
26.02.2019 16:02:27.450 *INFO* [OsgiInstallerImpl] org.apache.sling.audit.osgi.installer Waiting up to 90 seconds for bundles refresh
26.02.2019 16:03:57.450 *WARN* [OsgiInstallerImpl] org.apache.sling.installer.core.impl.util.WABundleRefresher No FrameworkEvent.PACKAGES_REFRESHED event received within 90 seconds after refresh, aborting wait.
26.02.2019 16:03:57.450 *INFO* [OsgiInstallerImpl] org.apache.sling.audit.osgi.installer Done refreshing 1 bundles
26.02.2019 17:06:28.669 *INFO* [OsgiInstallerImpl] org.apache.sling.installer.core.impl.OsgiInstallerImpl Apache Sling OSGi Installer Service started.

A separate configured logger on DEBUG level stoped logging after deinstalling the old bundles. We deploy our artefacts on the Author using the same API like the Frontend of the Package Manager. The Author is replicating the Packages to the Publisher and triggers the installation.

Please let me know, which additional information you need and maybe how to provide if there is a special logger configuration necessary.

Best, andre

r0nny8000 avatar Feb 28 '19 18:02 r0nny8000