delta-sharing
delta-sharing copied to clipboard
Allow passing of profile in Spark options instead of a profile-file (#102)
This PR provides the ability to pass in the secrets stored in the profile-file directly into Spark.
While I've added an integration test, I was unable to run it directly since I do not have access to the test environment. However, using a debugger, I was able to confirm that the parameters from the DeltaSharingDataSource's createRelation method contain the options, and that the method constructs the RemoteDeltaLog object correctly.
Closes #102
Signed-off-by: Jacob Heldenbrand [email protected]
Thanks for the contribution. Actually, we discussed how to pass the secrets in the past and decided to use the profile file approach. Setting the secrets in the code directly is not encouraged so we don't want to support this.
The reason we want to pass credentials in as read options is not so we can hardcode secrets in the code, but so we can inject them into Spark dynamically. For our use case, we are not allowed to save secrets to an arbitrary S3 file, but instead must store them in an audited secret management system (in our case HashiCorp's Vault). We follow this pattern with other technologies as well, such as JDBC.
Is there an alternative approach we could use/implement to inject these secrets into Spark dynamically?
This is a fair point. Let me think about this and also how to support SQL.
Is there an alternative approach we could use/implement to inject these secrets into Spark dynamically?
As a workaround, you can manually read them from your audited secret management system, and store as a temp file, and use the temp file path to access.
I'll second the request for dynamic configuration.
The original design decision to use files seems to have been made on a very faulty assumption. Putting secrets in source code has been discouraged since first days of source code. It's not the responsibility of this project to save developers from themselves, especially not at the cost of increasing configuration/deployment complexity.