airbyte icon indicating copy to clipboard operation
airbyte copied to clipboard

🎉 Core: Support for IRSA when logging on EKS and passing role to Dest/Source Pods

Open gilandose opened this issue 3 years ago • 6 comments

What

DefaultAWSCredentialsProviderChain into the logging route for S3 this allows logs on EKS or EC2 to use standard more secure role assumption methods.

Specifically this unlocks IRSA by removing passing of access keys, and adding STS dependency to allow WebIdentityTokenProvider to be used. Note this would all allow a GKE cluster to interface with AWS using IRSA

Additionally allow the airbyte-admin ServiceAccount to be passed to connectors to allow the IRSA role to be assumed here. Note it may be preferable to assign a different service account for the dest/source pod eventually this could be a connector specific SA with restrictive permissions

resolves #10570 resolves #5942

extends excellent work done in: #10682 credit: @adamschmidt

How

For AWS access keys can automatically be read from the DefaultAWSCredentialsProviderChain, these were being passed around too eagerly and also set into system properties for logging. To change this I've removed some of the explicit passing, and allow the access keys to be passed around transparently when set

Recommended reading order

  1. deps.toml
  2. airbyte-config/config-models/src/main/java/io/airbyte/config/storage/DefaultS3ClientFactory.java
  3. airbyte-commons/src/main/resources/log4j2.xml
  4. airbyte-commons-worker/src/main/java/io/airbyte/workers/process/KubePodProcess.java
  5. build.gradle
  6. airbyte-integrations/connectors/destination-s3/build.gradle

🚨 User Impact 🚨

No user impact using ACCESS_KEYS works as before

Pre-merge Checklist

Expand the relevant checklist and delete the others.

Updating a connector

Community member or Airbyter

  • [ ] Grant edit access to maintainers (instructions)
  • [ ] Secrets in the connector's spec are annotated with airbyte_secret
  • [ ] Unit & integration tests added and passing. Community members, please provide proof of success locally e.g: screenshot or copy-paste unit, integration, and acceptance test output. To run acceptance tests for a Python connector, follow instructions in the README. For java connectors run ./gradlew :airbyte-integrations:connectors:<name>:integrationTest.
  • [ ] Code reviews completed
  • [ ] Documentation updated
    • [ ] Connector's README.md
    • [ ] Connector's bootstrap.md. See description and examples
    • [ ] Changelog updated in docs/integrations/<source or destination>/<name>.md including changelog. See changelog example
  • [ ] PR name follows PR naming conventions

Airbyter

If this is a community PR, the Airbyte engineer reviewing this PR is responsible for the below items.

  • [ ] Create a non-forked branch based on this PR and test the below items on it
  • [ ] Build is successful
  • [ ] If new credentials are required for use in CI, add them to GSM. Instructions.
  • [ ] /test connector=connectors/<name> command is passing
  • [ ] New Connector version released on Dockerhub and connector version bumped by running the /publish command described here

Tests

Unit

Task :airbyte-config:config-models:test

ConfigSchemaTest > testFile() PASSED

ConfigSchemaTest > testPrepareKnownSchemas() PASSED

DataTypeEnumTest > testConversionFromJsonSchemaPrimitiveToDataType() PASSED

EnvConfigsTest > testAirbyteVersion() PASSED

EnvConfigsTest > testGetDatabaseUrl() PASSED

EnvConfigsTest > testSharedJobEnvMapRetrieval() PASSED

EnvConfigsTest > testJobKubeNodeSelectors() PASSED

EnvConfigsTest > testWorkspaceRoot() PASSED

EnvConfigsTest > ensureGetEnvBehavior() PASSED

EnvConfigsTest > testLocalRoot() PASSED

EnvConfigsTest > testDiscoverKubeNodeSelectors() PASSED

EnvConfigsTest > testPublishMetrics() PASSED

EnvConfigsTest > testConfigRoot() PASSED

EnvConfigsTest > testGetDatabasePassword() PASSED

EnvConfigsTest > testSpecKubeNodeSelectors() PASSED

EnvConfigsTest > testAirbyteRole() PASSED

EnvConfigsTest > testCheckKubeNodeSelectors() PASSED

EnvConfigsTest > testDeploymentMode() PASSED

EnvConfigsTest > testTrackingStrategy() PASSED

EnvConfigsTest > testGetWorkspaceDockerMount() PASSED

EnvConfigsTest > testDockerNetwork() PASSED

EnvConfigsTest > testGetLocalDockerMount() PASSED

EnvConfigsTest > testRemoteConnectorCatalogUrl() PASSED

EnvConfigsTest > testGetDatabaseUser() PASSED

EnvConfigsTest > Should parse constant tags PASSED

EnvConfigsTest > testSplitKVPairsFromEnvString() PASSED

EnvConfigsTest > testworkerKubeTolerations() PASSED

EnvConfigsTest > testErrorReportingStrategy() PASSED

EnvConfigsTest > testAllJobEnvMapRetrieval() PASSED

EnvConfigsTest > CheckJobResourceSettings > checkJobCpuRequest should take precedent if set PASSED

EnvConfigsTest > CheckJobResourceSettings > checkJobMemoryLimit should take precedent if set PASSED

EnvConfigsTest > CheckJobResourceSettings > checkJobMemoryRequest should take precedent if set PASSED

EnvConfigsTest > CheckJobResourceSettings > should default to JobMainMemoryRequest if not set PASSED

EnvConfigsTest > CheckJobResourceSettings > should default to JobMainCpuRequest if not set PASSED

EnvConfigsTest > CheckJobResourceSettings > checkJobCpuLimit should take precedent if set PASSED

EnvConfigsTest > CheckJobResourceSettings > should default to JobMainCpuLimit if not set PASSED

EnvConfigsTest > CheckJobResourceSettings > should default to JobMainMemoryLimit if not set PASSED

CloudLogsClientTest > createCloudLogClientTestAws() PASSED

CloudLogsClientTest > createCloudLogClientTestGcs() PASSED

CloudLogsClientTest > createCloudLogClientTestMinio() PASSED

LogClientSingletonTest > testGetJobLogFileNullPath() PASSED

LogClientSingletonTest > testGetJobLogFileEmptyPath() PASSED

LogClientSingletonTest > testGetJobLogFileK8s() PASSED

ScheduleHelpersTest > testGetSecondsInUnit() PASSED

ScheduleHelpersTest > testAllOfTimeUnitEnumValues() PASSED

StateMessageHelperTest > testEmpty() PASSED

StateMessageHelperTest > testDuplicatedGlobalState() PASSED

StateMessageHelperTest > testLegacyInNewFormat() PASSED

StateMessageHelperTest > testLegacyStateConversion() PASSED

StateMessageHelperTest > testEmptyList() PASSED

StateMessageHelperTest > testStreamForceLegacy() PASSED

StateMessageHelperTest > testInvalidMixedState() PASSED

StateMessageHelperTest > testGlobalStateConversion() PASSED

StateMessageHelperTest > testGlobal() PASSED

StateMessageHelperTest > testLegacy() PASSED

StateMessageHelperTest > testGlobalForceLegacy() PASSED

StateMessageHelperTest > testLegacyInList() PASSED

StateMessageHelperTest > testStream() PASSED

StateMessageHelperTest > testStreamStateConversion() PASSED

YamlListToStandardDefinitionsTest > verifyAndConvertToModelList > should error out on bad data PASSED

YamlListToStandardDefinitionsTest > verifyAndConvertToModelList > should error out on duplicate name PASSED

YamlListToStandardDefinitionsTest > verifyAndConvertToModelList > should correctly read yaml file PASSED

YamlListToStandardDefinitionsTest > verifyAndConvertToModelList > should error out on duplicate id PASSED

YamlListToStandardDefinitionsTest > verifyAndConvertToModelList > should error out on empty file PASSED

YamlListToStandardDefinitionsTest > vertifyAndConvertToJsonNode > should error out on bad data PASSED

YamlListToStandardDefinitionsTest > vertifyAndConvertToJsonNode > should error out on duplicate name PASSED

YamlListToStandardDefinitionsTest > vertifyAndConvertToJsonNode > should correctly read yaml file PASSED

YamlListToStandardDefinitionsTest > vertifyAndConvertToJsonNode > should error out on duplicate id PASSED

YamlListToStandardDefinitionsTest > vertifyAndConvertToJsonNode > should error out on empty file PASSED

CloudLogsClientTest > testGcs() PASSED

CloudLogsClientTest > testGcsMissingBucket() PASSED

DefaultS3ClientFactoryTest > testS3() PASSED

DefaultS3ClientFactoryTest > testS3RegionNotSet() PASSED

MinioS3ClientFactoryTest > testMinio() PASSED

MinioS3ClientFactoryTest > testMinioEndpointMissing() PASSED

Integration

Put your integration tests output here.

Acceptance

Put your acceptance tests output here.

gilandose avatar Nov 06 '22 23:11 gilandose

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Nov 06 '22 23:11 CLAassistant

might be of interest @alafanechere as you reviewed #10682

gilandose avatar Nov 07 '22 00:11 gilandose

Hello :wave:, first thank you for this amazing contribution.

We really appreciate the effort you've made to improve the project. We ask you patience for the code review. Last month our team was focused on Hacktoberfest event and that probably left some PR without the proper feedback. And this week, due to the Thanksgiving US Holiday, most our team is out of office with their families. Another important piece of information why code won't be merge this week is: as a safety measure the core team has decided to freeze merging code to main branch to keep the release stable. Next week we'll return to you with the proper code review and update the status of your contribution.

If you have any questions feel free to send me a message in Slack! Thanks!

marcosmarxm avatar Nov 22 '22 20:11 marcosmarxm

Any update on this? We are pretty blocked waiting on this.

nhorton avatar Dec 11 '22 22:12 nhorton

Hello :wave::skin-tone-2: and thank you for your contribution!

Airbyte has instituted a code freeze between 19 and 30 December, to make sure there are no disruptions during the holidays. Because of this, reviewing and merging your contribution may take longer than usual. We apologize for the delay, but we want everyone to have a quiet and happy holiday.

If you have any questions or need further clarification, please don't hesitate to ping via Slack.

marcosmarxm avatar Dec 20 '22 19:12 marcosmarxm

Hey, any update on this?

Santhin avatar Jan 30 '23 13:01 Santhin

Hey, any update on this?

will take a look at the review comments

gilandose avatar Feb 20 '23 13:02 gilandose

+1 running into this issue with helm chart!

wizardofdevops avatar Feb 27 '23 22:02 wizardofdevops

Hello @gilandose, as you may have noticed we've just finished a pretty big overhaul of our code repositories. Most of this pull request is now located in a new repo airbytehq/airbyte-platform.

I've gone ahead and cherry picked your changes into a new PR: https://github.com/airbytehq/airbyte-platform/pull/177

I was not able to base the PR from your fork as I don't have write permissions, so I created a new branch migration/19010 in airbyte-platform for this purpose. You're more than welcome to make future changes to this work, however you will need to fork and submit a new PR due to repo permissions - if you'd like to do so, please tag me and/or reference this PR so that nothing gets lost. Either way, full authorship will be preserved in the commit history.

I've also copied over the comments/suggestions on this PR, so hopefully we can continue and merge this exciting new feature.

sh4sh avatar Feb 28 '23 20:02 sh4sh

I've added changes that are included in this pr here: https://github.com/airbytehq/airbyte/pull/23594

Alternatively, we can merge from airbytehq/airbyte master and use this PR -- however I don't have write access so I can't push changes. So, we have a new PR in the meantime 😄

sh4sh avatar Feb 28 '23 21:02 sh4sh