azure-service-operator
azure-service-operator copied to clipboard
My feedbacks on my 1st PR
Looking back at this first experience, here are some points I struggled with:
-
Development environment setup:
-
I didn't pay attention to the fact that tags are required, I only forked 'main' initially. It took me a while to understand that tags are required (didn't read the doc ;))
-
the dev container is an awesome tool to be quickly setup, however my user settings still applied. I don't know if they could be overridden inside the container. I had to remove these settings or the azure-arm.yaml would change drastically. I believe the following vscode settings could be enforced in the container so that everyone lints files the same way:
{ "files.trimTrailingWhitespace": true, "files.insertFinalNewline": true, "files.trimFinalNewlines": true }
-
-
Adding new code (ie new resource version)
-
first, as I didn't update the submodule to a new commit, I didn't find my resource version (https://github.com/Azure/azure-service-operator/issues/4634)
-
some tests I didn't update were failing and I have no idea how they could work before (the fluxconfiguration test was not installing the flux extension before pushing the configuration, so nothing happened and the test ended in timeout)
-
I like that most of the code is auto-generated and that there is this awesome framework to generate it but on the other hands, the errors are not very explicit and the amount of generated code makes it even harder to troubleshoot (hence the many QnA we had in the PR)
-
the long documentation here could probably be summarized with a checklist on the top and links to detailed actions / troubleshootings. Something like:
To add a new resource you should:
- dev environment setup, fork
- sub-module init + change commit if needed
- update azure-arm.yaml
- add your samples
- run
task - add tests. On this particular point, the documentation is split in half between Run test for added sample and commit the recording and Testing. May be it could be worth to merge them?
- run your tests & check the recording
- open the PR!
-
-
What made this PR (https://github.com/Azure/azure-service-operator/pull/4638) possible: your help and very detailed explanations about the different issues I faced ! The way ASO works is still quite obscure to me and I would definitely have given up without your expertise.
Additionally, I would be interested to know what are the purpose of the recordings?
If I think about anything else, I will update this issue
what are the purpose of the recordings?
Early in the development of ASO v2, we ran into problems with some of our end-to-end tests.
Sometimes a test that successfully worked one day would fail the next. We'd spend time investigating, assuming that this was due to something we'd changed, only to find the failure was due to outside factors. We encountered quota issues, capacity limits, outages, and more.
The chances of any particular test failing was remote - but as the number of supported resources grew, our test suite got larger and larger and the probability of all the tests passing at the same time declined. We also noticed that the cost of running our test suite was growing.
@Porges introduced recordings to take care of these issues. The recordings capture the HTTP REST interactions between ASO and Azure.
Once a test has passed once, the recording allows us to replay that test in isolation. We get to verify that ASO issues the same REST API requests, and that it behaves consistently as it receives the responses.
Benefits include
- Consistency - tests will (mostly) only fail if we break ASO, not if there's a problem elsewhere
- Cost - because we're not creating actual Azure resources, we don't incur much in the way of actual cost
- Performance - we can replay in a minute or so a test that might take 90 minutes to run "in the real world"
(The system's not perfect - timing differences from run to run very occasionally cause a test failure.)
We extended the recordings to our samples to verify that our samples work and can be used by new ASO users as a reference. I (@theunrepentantgeek) have had some poor experiences with other projects where I spent several days trying to use their samples, failing, only to be told "Yeah, those never worked" when I reached out for help.
Thanks for the detailed feedback, very much appreciated.
the dev container is an awesome tool to be quickly setup, however my user settings still applied. I don't know if they could be overridden inside the container. I had to remove these settings or the azure-arm.yaml would change drastically. I believe the following vscode settings could be enforced in the container so that everyone lints files the same way:
We will fix this, either as you suggested or by doing a one-time format of the azure-arm.yaml and then ensuring it doesn't get any trailing whitespace added back.
first, as I didn't update the submodule to a new commit, I didn't find my resource version (https://github.com/Azure/azure-service-operator/issues/4634)
We'll make some updates to reformat our documentation and pull this item up from the "troubleshooting section" to the happy path, where we call out that you need to check the submodule version and update it if needed (in a separate PR ideally).
some tests I didn't update were failing and I have no idea how they could work before (the fluxconfiguration test was not installing the flux extension before pushing the configuration, so nothing happened and the test ended in timeout)
We'll look into this and see what we can do to improve things here.
I like that most of the code is auto-generated and that there is this awesome framework to generate it but on the other hands, the errors are not very explicit and the amount of generated code makes it even harder to troubleshoot (hence the many QnA we had in the PR)
We'll have a walk through your PR again and try to pull out the error messages that seemed most confusing. One that comes to mind is possibly (coupled with improving the "adding a resource" documentation), making the "couldn't find that type" more explicit about checking if the submodule has been updated.
the long documentation here could probably be summarized with a checklist on the top and links to detailed actions / troubleshootings. Something like:
Agreed, we'll do a reformatting of the docs to make them easier to start with + scan.
Thank you for this detailed feedback, it's super appreciated! We'll turn the above feedback into issues and work through getting them improved.