delta-sharing icon indicating copy to clipboard operation
delta-sharing copied to clipboard

Allow passing of profile in Spark options instead of a profile-file (#102)

Open jacob-heldenbrand-cl opened this issue 3 years ago • 5 comments

This PR provides the ability to pass in the secrets stored in the profile-file directly into Spark.

While I've added an integration test, I was unable to run it directly since I do not have access to the test environment. However, using a debugger, I was able to confirm that the parameters from the DeltaSharingDataSource's createRelation method contain the options, and that the method constructs the RemoteDeltaLog object correctly.

Closes #102

Signed-off-by: Jacob Heldenbrand [email protected]

jacob-heldenbrand-cl avatar Jan 06 '22 19:01 jacob-heldenbrand-cl

Thanks for the contribution. Actually, we discussed how to pass the secrets in the past and decided to use the profile file approach. Setting the secrets in the code directly is not encouraged so we don't want to support this.

zsxwing avatar Jan 07 '22 05:01 zsxwing

The reason we want to pass credentials in as read options is not so we can hardcode secrets in the code, but so we can inject them into Spark dynamically. For our use case, we are not allowed to save secrets to an arbitrary S3 file, but instead must store them in an audited secret management system (in our case HashiCorp's Vault). We follow this pattern with other technologies as well, such as JDBC.

Is there an alternative approach we could use/implement to inject these secrets into Spark dynamically?

jacob-heldenbrand-cl avatar Jan 07 '22 18:01 jacob-heldenbrand-cl

This is a fair point. Let me think about this and also how to support SQL.

Is there an alternative approach we could use/implement to inject these secrets into Spark dynamically?

As a workaround, you can manually read them from your audited secret management system, and store as a temp file, and use the temp file path to access.

zsxwing avatar Jan 12 '22 07:01 zsxwing

I'll second the request for dynamic configuration.

The original design decision to use files seems to have been made on a very faulty assumption. Putting secrets in source code has been discouraged since first days of source code. It's not the responsibility of this project to save developers from themselves, especially not at the cost of increasing configuration/deployment complexity.

ssimeonov avatar Apr 16 '22 01:04 ssimeonov