cloudstack icon indicating copy to clipboard operation
cloudstack copied to clipboard

Changes error message when using invalid `endpoint.url`

Open lucas-a-martins opened this issue 1 year ago • 34 comments

Description

When validating the endpoint.url global setting, such as when using the createKubernetesCluster API, we encounter the exception message Global setting endpoint.url has to be set to the Management Server's API endpoint. This message lacks clarity about the problem to be solved and exposes information about the environment. Upon further analysis, it was found that this same message was used across several classes, repeating the same code in each one of them.

This pull request addresses this issue by replacing the exception with a generic one that does not expose any information about the environment. Also, it introduces a more detailed and explicit message to the logs, including the current value of endpoint.url. Furthermore, a new method has been implemented to validate the endpoint.url whenever necessary.

Types of changes

  • [ ] Breaking change (fix or feature that would cause existing functionality to change)
  • [ ] New feature (non-breaking change which adds functionality)
  • [ ] Bug fix (non-breaking change which fixes an issue)
  • [ ] Enhancement (improves an existing feature and functionality)
  • [x] Cleanup (Code refactoring and cleanup, that may add test cases)
  • [ ] build/CI

Feature/Enhancement Scale or Bug Severity

Feature/Enhancement Scale

  • [ ] Major
  • [ ] Minor

Bug Severity

  • [ ] BLOCKER
  • [ ] Critical
  • [ ] Major
  • [ ] Minor
  • [ ] Trivial

Screenshots (if appropriate):

How Has This Been Tested?

I tested by using CloudMonkey and tried to create a Kubernetes cluster. I implemented unit tests for the new method too.

Exception before changes:

(lucas) :cat: > create kubernetescluster name=test description=test zoneid=0ea94256-6969-46a8-b02e-b1604c81dd01 kubernetesversionid=ea7dc03b-40d0-4cfe-9144-4a9044cf88bd serviceofferingid=3eb8e0f1-d909-4bfe-aa55-2b9329920349 size=1
:see_no_evil: Error: (HTTP 530, error code 9999) Global setting endpoint.url has to be set to the Management Server's API end point

Exception after changes:

  • New exception:
(lucas) :cat: > create kubernetescluster name=test description=test zoneid=0ea94256-6969-46a8-b02e-b1604c81dd01 kubernetesversionid=ea7dc03b-40d0-4cfe-9144-4a9044cf88bd serviceofferingid=3eb8e0f1-d909-4bfe-aa55-2b9329920349 size=1
:see_no_evil: Error: (HTTP 530, error code 9999) Unable to complete this operation. Contact your cloud admin.
  • New log:
2024-01-31 19:16:41,219 ERROR [o.a.c.c.ApiServiceConfiguration] (qtp775386112-17:ctx-ff635773 ctx-96fc2faf) (logid:a8dc806e) Global setting endpoint.url cannot contain localhost or be blank. Current value: http://localhost:8080/client/api

lucas-a-martins avatar Feb 02 '24 18:02 lucas-a-martins

@blueorangutan package

sureshanaparti avatar Feb 05 '24 12:02 sureshanaparti

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (49cecae) 30.37% compared to head (6b86329) 30.76%. Report is 30 commits behind head on main.

Files Patch % Lines
...bernetes/cluster/KubernetesClusterManagerImpl.java 0.00% 3 Missing :warning:
...loud/network/lb/LoadBalancingRulesManagerImpl.java 0.00% 2 Missing :warning:
...r/actionworkers/KubernetesClusterActionWorker.java 0.00% 1 Missing :warning:
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #8603      +/-   ##
============================================
+ Coverage     30.37%   30.76%   +0.39%     
- Complexity    32633    33078     +445     
============================================
  Files          5352     5353       +1     
  Lines        374419   374605     +186     
  Branches      54609    54633      +24     
============================================
+ Hits         113719   115244    +1525     
+ Misses       245523   244085    -1438     
- Partials      15177    15276      +99     
Flag Coverage Δ
simulator-marvin-tests 24.61% <7.14%> (+0.46%) :arrow_up:
uitests 4.38% <ø> (-0.01%) :arrow_down:
unit-tests 16.44% <57.14%> (+0.02%) :arrow_up:

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Feb 05 '24 12:02 codecov[bot]

@blueorangutan package

weizhouapache avatar Feb 05 '24 18:02 weizhouapache

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Feb 05 '24 18:02 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8536

blueorangutan avatar Feb 05 '24 20:02 blueorangutan

@blueorangutan test

DaanHoogland avatar Feb 06 '24 10:02 DaanHoogland

@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan avatar Feb 06 '24 10:02 blueorangutan

@weizhouapache, thanks for your review. I've made changes to incorporate your suggestions in the latest commit (4cf74fc9aa08ecefa5c2c770512f7537f6c97b75), so if you could provide another review, I would greatly appreciate it!

lucas-a-martins avatar Feb 06 '24 19:02 lucas-a-martins

@blueorangutan package

weizhouapache avatar Feb 06 '24 19:02 weizhouapache

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Feb 06 '24 19:02 blueorangutan

@weizhouapache, thanks you for your review. I've made changes to incorporate your suggestions in the latest commit (4cf74fc9aa08ecefa5c2c770512f7537f6c97b75), so if you could provide another review, I would greatly appreciate it!

@lucas-a-martins thanks for the update Looks good

weizhouapache avatar Feb 06 '24 19:02 weizhouapache

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8556

blueorangutan avatar Feb 06 '24 20:02 blueorangutan

[SF] Trillian test result (tid-9096) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 48814 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8603-t9096-kvm-centos7.zip Smoke tests completed. 122 look OK, 0 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File

blueorangutan avatar Feb 07 '24 00:02 blueorangutan

Hey, @lucas-a-martins please change your logger imports to use log4j2

JoaoJandre avatar Feb 08 '24 14:02 JoaoJandre

@lucas-a-martins do you want to target this for 4.20?

  • if so see https://github.com/apache/cloudstack/pull/8603#issuecomment-1934208751
  • if no we can retarget it.

DaanHoogland avatar Feb 09 '24 15:02 DaanHoogland

@lucas-a-martins do you want to target this for 4.20?

@DaanHoogland I do. I'm just testing the code with log4j2.

lucas-a-martins avatar Feb 09 '24 17:02 lucas-a-martins

@blueorangutan package

weizhouapache avatar Feb 16 '24 18:02 weizhouapache

@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.

blueorangutan avatar Feb 16 '24 18:02 blueorangutan

Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8694

blueorangutan avatar Feb 16 '24 19:02 blueorangutan

@blueorangutan test

DaanHoogland avatar Feb 19 '24 09:02 DaanHoogland

@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests

blueorangutan avatar Feb 19 '24 09:02 blueorangutan

[SF] Trillian test result (tid-9288) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 42901 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8603-t9288-kvm-centos7.zip Smoke tests completed. 129 look OK, 0 have errors, 0 did not run Only failed and skipped tests results shown below:

Test Result Time (s) Test File

blueorangutan avatar Feb 19 '24 21:02 blueorangutan

To be honest, I think the old error message makes more sense for root admin

weizhouapache avatar Apr 16 '24 21:04 weizhouapache

To be honest, I think the old error message makes more sense for root admin

@weizhouapache,

One of the major issues with the old exception message is its lack of clarity. Even for root admins, it would be difficult to discern what action needs to be taken.

The new exception is more generic compared to the old one, as the intention is to give no environment information. To give any direction to operators, this PR adds a new log that is more objective, indicating to the operator where and what the problem is. IMO, the new message can also help prevent operators from making the same mistake again, as they will now know that endpoint.url can't be blank or localhost. The old message states endpoint.url has to be set, but if the endpoint.url is localhost, then it is already set, albeit with an invalid value, and the old message would not make sense.

lucas-a-martins avatar Apr 18 '24 17:04 lucas-a-martins

To be honest, I think the old error message makes more sense for root admin

@weizhouapache,

One of the major issues with the old exception message is its lack of clarity. Even for root admins, it would be difficult to discern what action needs to be taken.

The new exception is more generic compared to the old one, as the intention is to give no environment information. To give any direction to operators, this PR adds a new log that is more objective, indicating to the operator where and what the problem is. IMO, the new message can also help prevent operators from making the same mistake again, as they will now know that endpoint.url can't be blank or localhost. The old message states endpoint.url has to be set, but if the endpoint.url is localhost, then it is already set, albeit with an invalid value, and the old message would not make sense.

I know @lucas-a-martins

but the new error message Unable to complete this operation. Contact your cloud admin. is much more unclear

            LOGGER.error("Global setting [{}] cannot contain localhost or be blank. Current value: {}", ApiServletPath.key(), csUrl);
            throw new InvalidParameterValueException("Unable to complete this operation. Contact your cloud admin.");

weizhouapache avatar Apr 18 '24 18:04 weizhouapache

I know @lucas-a-martins

but the new error message Unable to complete this operation. Contact your cloud admin. is much more unclear

            LOGGER.error("Global setting [{}] cannot contain localhost or be blank. Current value: {}", ApiServletPath.key(), csUrl);
            throw new InvalidParameterValueException("Unable to complete this operation. Contact your cloud admin.");

@weizhouapache,

The exception being unclear is intentional, as this is what users will see. I understand what you mean by showing an exclusive message to ROOT admins, but is this really necessary since they have access to the logs with a clear message? Especially considering that the current exception is unclear, so this wouldn't change. I can't see a benefit in adding a "less unclear but not clear enough" exception. Also, is there any case of an exception in the UI being different between Users and Root Admins in the same situation?

lucas-a-martins avatar May 02 '24 20:05 lucas-a-martins

@weizhouapache,

The exception being unclear is intentional, as this is what users will see. I understand what you mean by showing an exclusive message to ROOT admins, but is this really necessary since they have access to the logs with a clear message? Especially considering that the current exception is unclear, so this wouldn't change. I can't see a benefit in adding a "less unclear but not clear enough" exception. Also, is there any case of an exception in the UI being different between Users and Root Admins in the same situation?

@lucas-a-martins I understand some error messages are unnecessary for normal users, not only this one, but also most other exceptions in vm the deployment, migration, etc. I agree we should display different error messages to normal users.

However I hope, as a root admin, we can get more information from error message on the UI and in the API response. Please bear in mind , not all root admins have access to the management server log and have the ability to find the root cause from massvie logs.

you can explore the options, for example in the API response or UI, display "Unable to complete this operation. Contact your cloud admin." (configurable) to normal users, and display the detailed error message to root admins. It could apply on ALL exceptions.

weizhouapache avatar May 03 '24 07:05 weizhouapache

@lucas-a-martins I understand some error messages are unnecessary for normal users, not only this one, but also most other exceptions in vm the deployment, migration, etc. I agree we should display different error messages to normal users.

However I hope, as a root admin, we can get more information from error message on the UI and in the API response. Please bear in mind , not all root admins have access to the management server log and have the ability to find the root cause from massvie logs.

you can explore the options, for example in the API response or UI, display "Unable to complete this operation. Contact your cloud admin." (configurable) to normal users, and display the detailed error message to root admins. It could apply on ALL exceptions.

Hey, @weizhouapache

Considering these scenarios, I agree that having an exception to Root Admins makes sense and could be applied to all exceptions; however, this goes out of the scope of this PR. We have a discussion about patterns about logging (#8746) that could address these cases as well; we can discuss that there.

lucas-a-martins avatar May 08 '24 17:05 lucas-a-martins

@lucas-a-martins I understand some error messages are unnecessary for normal users, not only this one, but also most other exceptions in vm the deployment, migration, etc. I agree we should display different error messages to normal users. However I hope, as a root admin, we can get more information from error message on the UI and in the API response. Please bear in mind , not all root admins have access to the management server log and have the ability to find the root cause from massvie logs. you can explore the options, for example in the API response or UI, display "Unable to complete this operation. Contact your cloud admin." (configurable) to normal users, and display the detailed error message to root admins. It could apply on ALL exceptions.

Hey, @weizhouapache

Considering these scenarios, I agree that having an exception to Root Admins makes sense and could be applied to all exceptions; however, this goes out of the scope of this PR. We have a discussion about patterns about logging (#8746) that could address these cases as well; we can discuss that there.

ok @lucas-a-martins let's hold on this PR, and continue when we have a consensus in #8746 ?

weizhouapache avatar May 08 '24 17:05 weizhouapache

ok @lucas-a-martins let's hold on this PR, and continue when we have a consensus in #8746 ?

I do not see why we need to wait that discussion to improve a log message. The previous message was misleading; the new log is way more clear. The idea of presenting different messages for operators and final users is interesting; however, is a different context from this PR. If we define something later, we can create another PR to address the concept.

GutoVeronezi avatar May 09 '24 17:05 GutoVeronezi