conjur icon indicating copy to clipboard operation
conjur copied to clipboard

Solution Design: CyberArk Conjur Provider for Secret Store CSI Driver

Open doodlesbykumbi opened this issue 1 year ago • 6 comments

Desired Outcome

Please describe the desired outcome for this PR. Said another way, what was the original request that resulted in these code changes? Feel free to copy this information from the connected issue.

Implemented Changes

Describe how the desired outcome above has been achieved with this PR. In particular, consider:

  • What's changed? Why were these changes made?
  • How should the reviewer approach this PR, especially if manual tests are required?
  • Are there relevant screenshots you can add to the PR description?

Connected Issue/Story

Resolves #[relevant GitHub issue(s), e.g. 76]

CyberArk internal issue ID: [insert issue ID]

Definition of Done

At least 1 todo must be completed in the sections below for the PR to be merged.

Changelog

  • [ ] The CHANGELOG has been updated, or
  • [ ] This PR does not include user-facing changes and doesn't require a CHANGELOG update

Test coverage

  • [ ] This PR includes new unit and integration tests to go with the code changes, or
  • [ ] The changes in this PR do not require tests

Documentation

  • [ ] Docs (e.g. READMEs) were updated in this PR
  • [ ] A follow-up issue to update official docs has been filed here: [insert issue ID]
  • [ ] This PR does not require updating any documentation

Behavior

  • [ ] This PR changes product behavior and has been reviewed by a PO, or
  • [ ] These changes are part of a larger initiative that will be reviewed later, or
  • [ ] No behavior was changed with this PR

Security

  • [ ] Security architect has reviewed the changes in this PR,
  • [ ] These changes are part of a larger initiative with a separate security review, or
  • [ ] There are no security aspects to these changes

doodlesbykumbi avatar Jun 21 '23 18:06 doodlesbykumbi

Code Climate has analyzed commit ef5f7364 and detected 164 issues on this pull request.

Here's the issue category breakdown:

Category Count
Complexity 1
Style 163

The test coverage on the diff in this pull request is 100.0% (50% is the threshold).

This pull request will bring the total coverage in the repository to 87.0% (-1.2% change).

View more on Code Climate.

codeclimate[bot] avatar Jun 22 '23 11:06 codeclimate[bot]

A couple of question and things I feel like are missing from this design:

  • Does the Kubernetes community include any guidance/expected interface/etc. for Secret Providers? I'm assuming a gRPC server is the correct way to accomplish this, but I don't see any references to the gRPC interface we'll be targeting.
  • It's likely my lack of knowledge about the CSI, but can you please include a diagram of how our custom gRPC service interacts with Conjur and Kubernetes to deliver secrets? It will help those with less deep knowledge better understand this design.
  • Can you please include some common workflow scenarios? From this document, I understand the happy path scenario (it "just works"). What would a scenario with bad credentials or lack of permission look like?
  • Operationally, how will we handle fixes/additional functionality in the future (aka, upgrades)?
  • What sort of messaging will be included for humans installing and maintaining this to understand and troubleshoot connection issues with Conjur (or installation issues)?

jvanderhoof avatar Aug 07 '23 18:08 jvanderhoof

@doodlesbykumbi, After reviewing the video, I have a much better sense of the approach of this work. That being said, I'd really like to see us expand this document to capture a fuller picture of the project intention and effort. The reason it's important we capture this in writing and not video is that by creating a full picture in a single design document, we provide an easy to find and reference resource for future engineers and product owners.

I feel this is really missing a crisp High Level Design overview. This should include:

  • As this is a new component, it will be very helpful to get a System High and/or a Container level diagram(s).
  • Summary of the gRPC API (this sounds like it'll be a copy/paste activity from the CSI Secrets page, but it'll help to have all the info in one place)
  • Workflows, one happy path and any likely sad-path flows. I'd love to understand how the Authn-JWT flow works without needing to dig through the PoC. I did couple of different flows for the Policy Factory SD using PlantUML.

As you've done the extensive PoC, it would also be valuable to at least summarize that work in a Low-Level Design section. A quick summary of key components of the gRPC service (even just a Compnent Diagram will help other engineers understand the code organization at a glance instead of extensive code review. To know if your PoC should be taken as a base and expanded or re-written base on your learnings, and the gaps you're aware of in functionality will be helpful for effort estimation as we bring this to fruition.

The high-level design is far more important than the low-level design (high level is limited to interfaces and interactions, low level is the code organization which implements those interfaces). The low level design makes more sense to document after feature completion rather than attempting guess exactly how the code should be organized.

One final comment. A solution design is really a story of a feature to be, in that spirit, can we figure out how to answer the questions in the context of the design instead of in a list at the end? Many of my questions were answered by those questions, but it made reviewing the solution design much more confusing.

Finally, writing is way harder than it looks (at least for me), if it's easier to just hop an a call and chat this through, let's do it! This is a great start, and to re-iterate, after reviewing the presentation, it's very obvious you have an excellent grasp of the scope and intricacies around this feature. This doc is an excellent start, and I just want to see more of those details reflected here.

jvanderhoof avatar Aug 07 '23 22:08 jvanderhoof

@doodlesbykumbi - adding questions here for visibility..

  • what parameters are required? Can any be skipped (ie: policyPath)
  • Are there any char or path length limits for policy paths
  • helm versions supported?
  • code coverage expectations - min of 80%
  • should we add or consider a baseline performance test to compare against secrets-provider-for-k8s?
  • Are secrets retrieved individually or in batch, if one fails what happens to subsequent secrets
  • Any refactoring work we want to do in authn-jwt for the csi driver
  • Can we re-use any of the secrets provider tests or test framework for this
  • Would like to discuss more around specific testing scenarios and common customer use cases
  • Support on different platforms - ie: EKS, AKS, RANCHER, TANZU..
  • Versions of K8s/OCP supported

adamouamani avatar Aug 18 '23 20:08 adamouamani

Hey @jvanderhoof .This is great! I appreciate you taking time to give this thorough feedback. I've been exploring Plant UML over the past week and it's such a great way to convey certain kinds of information. I've made some updates to the solution design to address the points you raised

doodlesbykumbi avatar Aug 21 '23 13:08 doodlesbykumbi

@adamouamani Thanks for posting these here

  1. what parameters are required? Can any be skipped (ie: policyPath)
    • Only policyPath should be optional
  2. Are there any char or path length limits for policy paths
    • With the current approach around defining secret specification in the SecretProviderClass instances, this falls under the constraints of keys and values for Kubernetes objects, see https://stackoverflow.com/a/53015758. Keys are 256 characters, and the value can go up to 1mb. So the policy path can be 1mb. While the file path can be 256 characters.
  3. helm versions supported?
    • Helm3
  4. code coverage expectations - min of 80%
    • That's fine. I think the unit tests can cover this base with relative ease
  5. should we add or consider a baseline performance test to compare against secrets-provider-for-k8s?
    • I'm not sure this is relevant
  6. Are secrets retrieved individually or in batch, if one fails what happens to subsequent secrets
    • The strategy demonstrated in the POC is in batch. If one fails then the whole request fails, and the workload never starts up. Which is something very nice about the CSI driver as compared to secrets provider.
  7. Any refactoring work we want to do in authn-jwt for the csi driver
    • Not that I'm aware of. I was able to make this work with authn-jwt as is
  8. Can we re-use any of the secrets provider tests or test framework for this
    • It's possible but I think it's more trouble than it is worth. I think this provider will get the most value from unit type tests
  9. Would like to discuss more around specific testing scenarios and common customer use cases
    • Happy to have those conversations. Though I don't imagine there'll be much novelty in the cases that need to be covered since the provider is relatively simple and really only responsible for fetching Secrets from Conjur. I don't think we need to test the CSI Driver, it's out of scope for us.
  10. Support on different platforms - ie: EKS, AKS, RANCHER, TANZU..
    • Same as below. As long as those different platforms meet the requirements of the Secrets Store CSI Driver it should just work.
  11. Versions of K8s/OCP supported
    • This will be documentation. The support will be conveniently linked to the Secrets Store CSI Driver, since the provider is merely a plugin. That support is currently 1.19+

doodlesbykumbi avatar Aug 21 '23 13:08 doodlesbykumbi