Streamline Hygieia Onboarding
As part of the Hygieia planning session at Verizon this past summer, SingleStone committed to working on a few new features. This one relates to simplifying the initial setup of Hygieia.
Problem: The initial installation and configuration of Hygieia is time consuming and requires many steps and tools to be installed (e.g. Java, Maven, Node, gulp, Mongo, etc.). From personal experience this can take 1-3 hours depending on the installers familiarity with all the tools listed above. Based on a discussion with @tabladrum, this is one of the largest sources of issues from the Hygieia community and solving this issue would result in less support issues being generated.
Outcomes: A new user can get Hygieia up and running in less than 10 minutes (from initial download to dashboard created and being populated with real data).
Design Thoughts:
- Reasonable defaults should be used so streamlined installation can support ~100 teams or dashboards out of the box with minimal/no configuration. More sophisticated highly available and fault-tolerant installations are beyond the initial scope.
- When Hygieia is “run” this means the UI, API and the default set of collectors required to populate the default dashboard are automatically started from a single command.
- Ideally the user can configure which collectors are started in a simple way and/or the can shutoff previously started collectors quickly.
- The approach should be cloud-native, but support on premise / local installations too.
Design Options: Docker Hub - A publicly available image containing all the software necessary to run Hygieia pre-installed. Users would install with “docker pull hygieia” and run with “docker run hygieia”. This image would include all the current “capitalone/hygieia-*” repos currently published on hub.docker.com (basically the UI, API and reasonable default set of collectors in a single image).
AWS ECS & EKS - Similar to above, this would enable a user with an AWS account to launch Hygieia in a managed environment. To launch, the user would choose the Hygieia image, create a task definition, configure the service and cluster before launching.
Hygieia AMI - A baked AMI published to the AWS Marketplace with all Hygieia components installed that is used to stand-up a new instance. Once a user logs into their AWS account, from the marketplace, they specify the Action Type, EC2 instance size, VPC, Subnet, Security Group and Key Pair then clicks Launch to start the server.
Next Steps This issues is meant to start a discussion in this community on how to proceed forward. Before writing code, we want to tease out other requirements, ideas and options on how this might be done before implementing it (likely starting in December).
I would say these are reasonable goals and options but the devil will be in the details. Honestly, I don’t think the deployment approach is the challenge and so any of the options mentioned above will likely work. To me, the challenge will be making the configuration of the individual components simple and easy. For example, each collector needs to be configured to point to the instance of whatever tool is being used (i.e. Jenkins, GitHub, etc..). In some cases, the external tool itself will need to be configured. For instance, it is better to use Github webhooks and the Jenkins Hygieia plugin than rely on the batch collection strategy. This feels like a huge barrier to getting started, especially if the person does not have administrative access to the tools or just doesn’t know the details to get the dashboard up and running.
I think we should take to heart the feedback provided in #2540. I'm sure the documentation is accurate and provides all of the relevant details needed to standup a Hygieia instance with any required collectors. However, can a person new to Hygieia consume all of this documentation and get a working version of Hygieia running in 10 minutes or less?
Here are some potential enhancements to @ryanshriver's proposal that could solve these challenges:
- Hygieia Installation Generator - What if we created a web-based generator or CLI that guided a user through the installation and configuration process. The output would be a scaffolded installation that takes advantage of 1 or more of the design options mentioned above, with the configuration files populated based on the user's answers.
- Unified properties file - I know it is not recommended to merge config files but this is a reasonable approach to reduce "getting started" overhead. We could provide a sample config file with as many reasonable defaults as possible and provide clear guidance in a getting started document. Hopefully we could limit the user changes to a single config file.
Thoughts?
Love the Installation Generator idea. The generator itself can generate the properties file for the collectors or even register hooks (such as github webhook). This may be interesting https://github.com/codecentric/spring-boot-admin/issues/99
I like the Installation Generator idea too. I think one of the primary advantages to an approach like that would be the ability to vet the inputs that the user provides, alerting them when there are issues with the information provided. Although this approach would be somewhat limited in it's ability to scale as you need to follow the GHE appliance model where all layers exist on the same instance, image, etc, but I think it would be very good for small to medium sized consumers.
So along the Installation Generator idea, I think we should focus on a few main threads, perhaps in parallel:
First, we should figure out how to ask natural questions about how the user wants Hygieia to help them and capture their answers in a meaningful format for use later. I'm happy to help here but would like to pair with @tabladrum or some others to flush this out a bit. Might pull in Kevin Tuskey here too.
Second, we should figure out how to translate their answers into Hygieia configuration without requiring major changes to Hygieia itself. Perhaps @jayhogan can help here with assistance from others.
Third, we should figure out how this generator will be deployed and operated. Open to anyone who wants to help on this one.
For the first one, I'm envisioning a web survey/intuitive guide that prompts the user to answer certain questions about what they want Hygieia to help them manage (team, product, portfolio, pipelines, etc.) and uses their answers plus some code/automation to output 1) exact instructions/configurations for the user to make on external systems and 2) configuration file(s) that Hygieia can consume to make it work like the user expects it should work based on their answers.
Open to feedback on any of this.
For the second and third threads I imagined something like a single docker image (could also be AMI) that contains the bare essentials, DB, api, UI and maybe audit-api. The first time the user goes to the UI in their browser they are redirected to a wizard which goes through the setup process asking the questions from your first thread. Based on the answers provided the generator would create properties files like we use today, download the appropriate collectors and start them using the generated properties files for configuration. Then it would use the remote dashboard api to create the corresponding dashboards. Finally it would set a flag denoting the installation is complete and subsequent visits to the UI would behave as usual. This could be built directly into the UI making it so that we don't have to do anything additional to deploy the generator.
After identifying and debating a few design options we’ve decided to go in this direction to implement an MVP that simplifies the initial setup of Hygieia. Target date is late March / early April to:
- Automate the setup of a Hygieia environment in AWS using CloudFormation
- Deploy Hygieia as a set of containers running on ECS (one each for UI, API and each collector)
- Fetch Hygieia containers at deployment time from public Dockyard repo
- User configuration choices including what collectors to run are passed as CloudFormation parameters at setup time
Notes:
-
At some point a GUI-based way for the user to select/configure the collectors is possible and we'd like to do it, but it would be hard to do with the current Hygieia architecture. Our proposed MVP solution doesn’t preclude this at some point later, but for MVP we decided against a GUI for now.
-
The solution is Docker-based so should be relatively portable, but for simplicity we chose to deploy this on AWS initially. We leverage the AWS ECS Fargate service for hosting the containers and the DocumentDB service for hosting the MongoDB so there are no virtual servers to maintain with the solution.
-
Based on internal testing, it takes about 12 minutes to provision and configure all AWS resources to run Hygieia. At that point the Admin can start building dashboards. This is very close to our 10 minute original goal (above) and perhaps as we implement we can streamline this slightly to get to under 10 minutes.
-
We make reasonable defaults to “wire together” the different Hygieia services without user’s having to specify every configuration. However, these can be overridden via CloudFormation parameters.
We’re in the process of implementing this solution and will create a PR with all proposed changes when done, ideally by late March / early April. More to come, feedback welcome.
@ryanshriver We have added your contributions to https://github.com/Hygieia/hygieia-aws-quickstart . Please confirm if this issue can be closed? else I can move this issue to the appropriate repository