cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

prevent an NPE on an uninitialised TemplateObject

Open DaanHoogland opened this issue 10 months ago • 20 comments

Description

This PR fixes an NPE seen in an evironment after an upgrade from 4.15 to 4.17. It is not clear if configuration mistakes where made, but this PR attempts to handle the NPE a bit.

Starting a stopped VM after ACS upgrade from 4.15.2 to 4.17.2 resulted in failure to start VM caused by NPE while starting the VR.

2024-01-02 22:40:32,765 ERROR [c.c.a.ApiAsyncJobDispatcher] (API-Job-Executor-56:ctx-d6145db6 job-312930) (logid:77c2bb82) Unexpected exception while executing org.apache.cloudstack.api.command.admin.vm.StartVMCmdByAdmin
java.lang.NullPointerException
        at org.apache.cloudstack.storage.image.store.TemplateObject.getId(TemplateObject.java:111)
        at org.apache.cloudstack.storage.volume.VolumeServiceImpl.createVolumeFromTemplateAsync(VolumeServiceImpl.java:1533)
        at org.apache.cloudstack.engine.orchestration.VolumeOrchestrator.recreateVolume(VolumeOrchestrator.java:1583)
        at org.apache.cloudstack.engine.orchestration.VolumeOrchestrator.prepare(VolumeOrchestrator.java:1689)
        at com.cloud.vm.VirtualMachineManagerImpl.orchestrateStart(VirtualMachineManagerImpl.java:1179)
        at com.cloud.vm.VirtualMachineManagerImpl.advanceStart(VirtualMachineManagerImpl.java:972)
        at com.cloud.network.router.NetworkHelperImpl.start(NetworkHelperImpl.java:315)
        at com.cloud.network.router.NetworkHelperImpl.startVirtualRouter(NetworkHelperImpl.java:394)
        at com.cloud.network.router.NetworkHelperImpl.startRouters(NetworkHelperImpl.java:379)
        at org.apache.cloudstack.network.router.deployment.RouterDeploymentDefinition.deployVirtualRouter(RouterDeploymentDefinition.java:209)
        at com.cloud.network.element.VirtualRouterElement.prepare(VirtualRouterElement.java:285)
        at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.prepareElement(NetworkOrchestrator.java:1591)
        at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.prepareNic(NetworkOrchestrator.java:1946)
        at org.apache.cloudstack.engine.orchestration.NetworkOrchestrator.prepare(NetworkOrchestrator.java:1880)

Types of changes

  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] New feature (non-breaking change which adds functionality)
  • [x] Bug fix (non-breaking change which fixes an issue)
  • [ ] Enhancement (improves an existing feature and functionality)
  • [ ] Cleanup (Code refactoring and cleanup, that may add test cases)
  • [ ] build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • [ ] Major
  • [x] Minor

Bug Severity

  • [ ] BLOCKER
  • [ ] Critical
  • [x] Major
  • [ ] Minor
  • [ ] Trivial

Screenshots (if appropriate):

How Has This Been Tested?

How did you try to break this feature and the system with this change?

DaanHoogland avatar Apr 10 '24 07:04 DaanHoogland

clearly something wrong in vmware; investigating

DaanHoogland avatar Apr 19 '24 07:04 DaanHoogland

[SF] Trillian test result (tid-9945) Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server r8 Total time taken: 64704 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8898-t9945-vmware-67u3.zip Smoke tests completed. 62 look OK, 19 have errors, 0 did not run Only failed and skipped tests results shown below:

oddly some of these test files are not in the smoke test directory

  • test_deploy_vm.py
  • test_escalations_templates.py
  • test_host_annotations.py
  • test_vm_ha
  • test_vm_sync

and some should work

  • [ ] test_affinity_groups_projects.py
  • [ ] test_deploy_vm_root_resize.py
  • [ ] test_global_settings.py
  • [ ] test_host_maintenance.py
  • [ ] test_network.py
  • [ ] test_outofbandmanagement.py
  • [ ] test_privategw_acl.py
  • [ ] test_projects.py
  • [ ] test_public_ip_range.py
  • [ ] test_pvlan.py
  • [ ] test_routers_network_ops.py
  • [ ] test_templates.py
  • [ ] test_vm_life_cycle.py (seen to fail on main lately)
  • [ ] test_vpc_redundant.py

suspicion ; broken tests broke the env for the rest, doing a manual round on these.

DaanHoogland avatar Apr 24 '24 13:04 DaanHoogland

Codecov Report

Attention: Patch coverage is 0% with 8 lines in your changes missing coverage. Please review.

Project coverage is 12.23%. Comparing base (2339412) to head (3a0d1a4).

Files Patch % Lines
...cloudstack/storage/image/store/TemplateObject.java 0.00% 6 Missing :warning:
...udstack/storage/image/TemplateDataFactoryImpl.java 0.00% 2 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               4.18    #8898      +/-   ##
============================================
- Coverage     12.23%   12.23%   -0.01%     
  Complexity     9291     9291              
============================================
  Files          4698     4698              
  Lines        414257   414265       +8     
  Branches      52895    53365     +470     
============================================
- Hits          50705    50703       -2     
- Misses       357251   357261      +10     
  Partials       6301     6301              
Flag Coverage Δ
unittests 12.23% <0.00%> (-0.01%) :arrow_down:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Apr 25 '24 14:04 codecov-commenter

sugestion by @vishesh92 :

diff --git a/engine/storage/image/src/main/java/org/apache/cloudstack/storage/image/TemplateDataFactoryImpl.java b/engine/storage/image/src/main/java/org/apache/cloudstack/storage/image/TemplateDataFactoryImpl.java
index 492ec74382..30c0131be8 100644
--- a/engine/storage/image/src/main/java/org/apache/cloudstack/storage/image/TemplateDataFactoryImpl.java
+++ b/engine/storage/image/src/main/java/org/apache/cloudstack/storage/image/TemplateDataFactoryImpl.java
@@ -97,6 +97,9 @@ public class TemplateDataFactoryImpl implements TemplateDataFactory {
     @Override
     public TemplateInfo getTemplate(long templateId, DataStore store) {
         VMTemplateVO templ = imageDataDao.findById(templateId);
+        if (templ == null) {
+            return null;
+        }
         if (store == null && !templ.isDirectDownload()) {
             TemplateObject tmpl = TemplateObject.getTemplate(templ, null, null);
             return tmpl;
``

DaanHoogland avatar May 02 '24 12:05 DaanHoogland

[SF] Trillian Build Failed (tid-10232)

blueorangutan avatar May 21 '24 09:05 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9659

blueorangutan avatar May 22 '24 08:05 blueorangutan

[SF] Trillian test result (tid-10251) Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server r8 Total time taken: 48420 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8898-t10251-vmware-67u3.zip Smoke tests completed. 110 look OK, 0 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File

blueorangutan avatar May 22 '24 23:05 blueorangutan

finally it passes. i don't know what else to do for this corner case so leaving it at this

DaanHoogland avatar May 23 '24 07:05 DaanHoogland

@blueorangutan package

harikrishna-patnala avatar May 24 '24 05:05 harikrishna-patnala

@harikrishna-patnala a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar May 24 '24 05:05 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9693

blueorangutan avatar May 24 '24 06:05 blueorangutan

[SF] Trillian test result (tid-10363) Environment: kvm-alma9 (x2), Advanced Networking with Mgmt server a9 Total time taken: 47282 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8898-t10363-kvm-alma9.zip Smoke tests completed. 110 look OK, 0 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File

blueorangutan avatar Jun 06 '24 02:06 blueorangutan

@blueorangutan package

DaanHoogland avatar Jun 07 '24 06:06 DaanHoogland

@DaanHoogland a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Jun 07 '24 06:06 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 9814

blueorangutan avatar Jun 07 '24 07:06 blueorangutan

@blueorangutan test alma9 vmware-67u3

DaanHoogland avatar Jun 07 '24 08:06 DaanHoogland

@DaanHoogland a [SL] Trillian-Jenkins test job (alma9 mgmt + vmware-67u3) has been kicked to run smoke tests

blueorangutan avatar Jun 07 '24 08:06 blueorangutan

[SF] Trillian test result (tid-10389) Environment: vmware-67u3 (x2), Advanced Networking with Mgmt server a9 Total time taken: 42698 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8898-t10389-vmware-67u3.zip Smoke tests completed. 110 look OK, 0 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File

blueorangutan avatar Jun 07 '24 20:06 blueorangutan