cloudstack
cloudstack copied to clipboard
Changes error message when using invalid `endpoint.url`
Description
When validating the endpoint.url
global setting, such as when using the createKubernetesCluster
API, we encounter the exception message Global setting endpoint.url has to be set to the Management Server's API endpoint
. This message lacks clarity about the problem to be solved and exposes information about the environment. Upon further analysis, it was found that this same message was used across several classes, repeating the same code in each one of them.
This pull request addresses this issue by replacing the exception with a generic one that does not expose any information about the environment. Also, it introduces a more detailed and explicit message to the logs, including the current value of endpoint.url
. Furthermore, a new method has been implemented to validate the endpoint.url
whenever necessary.
Types of changes
- [ ] Breaking change (fix or feature that would cause existing functionality to change)
- [ ] New feature (non-breaking change which adds functionality)
- [ ] Bug fix (non-breaking change which fixes an issue)
- [ ] Enhancement (improves an existing feature and functionality)
- [x] Cleanup (Code refactoring and cleanup, that may add test cases)
- [ ] build/CI
Feature/Enhancement Scale or Bug Severity
Feature/Enhancement Scale
- [ ] Major
- [ ] Minor
Bug Severity
- [ ] BLOCKER
- [ ] Critical
- [ ] Major
- [ ] Minor
- [ ] Trivial
Screenshots (if appropriate):
How Has This Been Tested?
I tested by using CloudMonkey and tried to create a Kubernetes cluster. I implemented unit tests for the new method too.
Exception before changes:
(lucas) :cat: > create kubernetescluster name=test description=test zoneid=0ea94256-6969-46a8-b02e-b1604c81dd01 kubernetesversionid=ea7dc03b-40d0-4cfe-9144-4a9044cf88bd serviceofferingid=3eb8e0f1-d909-4bfe-aa55-2b9329920349 size=1
:see_no_evil: Error: (HTTP 530, error code 9999) Global setting endpoint.url has to be set to the Management Server's API end point
Exception after changes:
- New exception:
(lucas) :cat: > create kubernetescluster name=test description=test zoneid=0ea94256-6969-46a8-b02e-b1604c81dd01 kubernetesversionid=ea7dc03b-40d0-4cfe-9144-4a9044cf88bd serviceofferingid=3eb8e0f1-d909-4bfe-aa55-2b9329920349 size=1
:see_no_evil: Error: (HTTP 530, error code 9999) Unable to complete this operation. Contact your cloud admin.
- New log:
2024-01-31 19:16:41,219 ERROR [o.a.c.c.ApiServiceConfiguration] (qtp775386112-17:ctx-ff635773 ctx-96fc2faf) (logid:a8dc806e) Global setting endpoint.url cannot contain localhost or be blank. Current value: http://localhost:8080/client/api
@blueorangutan package
Codecov Report
Attention: 6 lines
in your changes are missing coverage. Please review.
Comparison is base (
49cecae
) 30.37% compared to head (6b86329
) 30.76%. Report is 30 commits behind head on main.
Additional details and impacted files
@@ Coverage Diff @@
## main #8603 +/- ##
============================================
+ Coverage 30.37% 30.76% +0.39%
- Complexity 32633 33078 +445
============================================
Files 5352 5353 +1
Lines 374419 374605 +186
Branches 54609 54633 +24
============================================
+ Hits 113719 115244 +1525
+ Misses 245523 244085 -1438
- Partials 15177 15276 +99
Flag | Coverage Δ | |
---|---|---|
simulator-marvin-tests | 24.61% <7.14%> (+0.46%) |
:arrow_up: |
uitests | 4.38% <ø> (-0.01%) |
:arrow_down: |
unit-tests | 16.44% <57.14%> (+0.02%) |
:arrow_up: |
Flags with carried forward coverage won't be shown. Click here to find out more.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
@blueorangutan package
@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8536
@blueorangutan test
@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests
@weizhouapache, thanks for your review. I've made changes to incorporate your suggestions in the latest commit (4cf74fc9aa08ecefa5c2c770512f7537f6c97b75), so if you could provide another review, I would greatly appreciate it!
@blueorangutan package
@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
@weizhouapache, thanks you for your review. I've made changes to incorporate your suggestions in the latest commit (4cf74fc9aa08ecefa5c2c770512f7537f6c97b75), so if you could provide another review, I would greatly appreciate it!
@lucas-a-martins thanks for the update Looks good
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8556
[SF] Trillian test result (tid-9096) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 48814 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8603-t9096-kvm-centos7.zip Smoke tests completed. 122 look OK, 0 have errors, 0 did not run Only failed and skipped tests results shown below:
Test | Result | Time (s) | Test File |
---|
Hey, @lucas-a-martins please change your logger imports to use log4j2
@lucas-a-martins do you want to target this for 4.20?
- if so see https://github.com/apache/cloudstack/pull/8603#issuecomment-1934208751
- if no we can retarget it.
@lucas-a-martins do you want to target this for 4.20?
@DaanHoogland I do. I'm just testing the code with log4j2.
@blueorangutan package
@weizhouapache a [SL] Jenkins job has been kicked to build packages. It will be bundled with KVM, XenServer and VMware SystemVM templates. I'll keep you posted as I make progress.
Packaging result [SF]: ✔️ el7 ✔️ el8 ✔️ el9 ✔️ debian ✔️ suse15. SL-JID 8694
@blueorangutan test
@DaanHoogland a [SL] Trillian-Jenkins test job (centos7 mgmt + kvm-centos7) has been kicked to run smoke tests
[SF] Trillian test result (tid-9288) Environment: kvm-centos7 (x2), Advanced Networking with Mgmt server 7 Total time taken: 42901 seconds Marvin logs: https://github.com/blueorangutan/acs-prs/releases/download/trillian/pr8603-t9288-kvm-centos7.zip Smoke tests completed. 129 look OK, 0 have errors, 0 did not run Only failed and skipped tests results shown below:
Test | Result | Time (s) | Test File |
---|
To be honest, I think the old error message makes more sense for root admin
To be honest, I think the old error message makes more sense for root admin
@weizhouapache,
One of the major issues with the old exception message is its lack of clarity. Even for root admins, it would be difficult to discern what action needs to be taken.
The new exception is more generic compared to the old one, as the intention is to give no environment information. To give any direction to operators, this PR adds a new log that is more objective, indicating to the operator where and what the problem is. IMO, the new message can also help prevent operators from making the same mistake again, as they will now know that endpoint.url
can't be blank or localhost
. The old message states endpoint.url has to be set
, but if the endpoint.url
is localhost
, then it is already set, albeit with an invalid value, and the old message would not make sense.
To be honest, I think the old error message makes more sense for root admin
@weizhouapache,
One of the major issues with the old exception message is its lack of clarity. Even for root admins, it would be difficult to discern what action needs to be taken.
The new exception is more generic compared to the old one, as the intention is to give no environment information. To give any direction to operators, this PR adds a new log that is more objective, indicating to the operator where and what the problem is. IMO, the new message can also help prevent operators from making the same mistake again, as they will now know that
endpoint.url
can't be blank orlocalhost
. The old message statesendpoint.url has to be set
, but if theendpoint.url
islocalhost
, then it is already set, albeit with an invalid value, and the old message would not make sense.
I know @lucas-a-martins
but the new error message Unable to complete this operation. Contact your cloud admin.
is much more unclear
LOGGER.error("Global setting [{}] cannot contain localhost or be blank. Current value: {}", ApiServletPath.key(), csUrl);
throw new InvalidParameterValueException("Unable to complete this operation. Contact your cloud admin.");
I know @lucas-a-martins
but the new error message
Unable to complete this operation. Contact your cloud admin.
is much more unclearLOGGER.error("Global setting [{}] cannot contain localhost or be blank. Current value: {}", ApiServletPath.key(), csUrl); throw new InvalidParameterValueException("Unable to complete this operation. Contact your cloud admin.");
@weizhouapache,
The exception being unclear is intentional, as this is what users will see. I understand what you mean by showing an exclusive message to ROOT admins, but is this really necessary since they have access to the logs with a clear message? Especially considering that the current exception is unclear, so this wouldn't change. I can't see a benefit in adding a "less unclear but not clear enough" exception. Also, is there any case of an exception in the UI being different between Users and Root Admins in the same situation?
@weizhouapache,
The exception being unclear is intentional, as this is what users will see. I understand what you mean by showing an exclusive message to ROOT admins, but is this really necessary since they have access to the logs with a clear message? Especially considering that the current exception is unclear, so this wouldn't change. I can't see a benefit in adding a "less unclear but not clear enough" exception. Also, is there any case of an exception in the UI being different between Users and Root Admins in the same situation?
@lucas-a-martins I understand some error messages are unnecessary for normal users, not only this one, but also most other exceptions in vm the deployment, migration, etc. I agree we should display different error messages to normal users.
However I hope, as a root admin, we can get more information from error message on the UI and in the API response. Please bear in mind , not all root admins have access to the management server log and have the ability to find the root cause from massvie logs.
you can explore the options, for example in the API response or UI, display "Unable to complete this operation. Contact your cloud admin." (configurable) to normal users, and display the detailed error message to root admins. It could apply on ALL exceptions.
@lucas-a-martins I understand some error messages are unnecessary for normal users, not only this one, but also most other exceptions in vm the deployment, migration, etc. I agree we should display different error messages to normal users.
However I hope, as a root admin, we can get more information from error message on the UI and in the API response. Please bear in mind , not all root admins have access to the management server log and have the ability to find the root cause from massvie logs.
you can explore the options, for example in the API response or UI, display "Unable to complete this operation. Contact your cloud admin." (configurable) to normal users, and display the detailed error message to root admins. It could apply on ALL exceptions.
Hey, @weizhouapache
Considering these scenarios, I agree that having an exception to Root Admins makes sense and could be applied to all exceptions; however, this goes out of the scope of this PR. We have a discussion about patterns about logging (#8746) that could address these cases as well; we can discuss that there.
@lucas-a-martins I understand some error messages are unnecessary for normal users, not only this one, but also most other exceptions in vm the deployment, migration, etc. I agree we should display different error messages to normal users. However I hope, as a root admin, we can get more information from error message on the UI and in the API response. Please bear in mind , not all root admins have access to the management server log and have the ability to find the root cause from massvie logs. you can explore the options, for example in the API response or UI, display "Unable to complete this operation. Contact your cloud admin." (configurable) to normal users, and display the detailed error message to root admins. It could apply on ALL exceptions.
Hey, @weizhouapache
Considering these scenarios, I agree that having an exception to Root Admins makes sense and could be applied to all exceptions; however, this goes out of the scope of this PR. We have a discussion about patterns about logging (#8746) that could address these cases as well; we can discuss that there.
ok @lucas-a-martins let's hold on this PR, and continue when we have a consensus in #8746 ?
ok @lucas-a-martins let's hold on this PR, and continue when we have a consensus in #8746 ?
I do not see why we need to wait that discussion to improve a log message. The previous message was misleading; the new log is way more clear. The idea of presenting different messages for operators and final users is interesting; however, is a different context from this PR. If we define something later, we can create another PR to address the concept.