spark-snowflake icon indicating copy to clipboard operation
spark-snowflake copied to clipboard

Issue with Databricks Spark Streaming: Private key must be specified in Snowflake streaming

Open satendrakumar opened this issue 5 years ago • 7 comments

We are using databricks Spark to load data into snowflake. It is working perfectly with Batch jobs but failing with streaming. here is code:

 val options =Map(
"sfUrl" -> "********.snowflakecomputing.com",
"sfUser" -> "*****",
"sfPassword" -> "****",
"sfDatabase" -> "TEST_DB",
"sfSchema" -> "TEST_DOCUMENT",
"sfWarehouse" -> "COMPUTE_WH"
)

val rawStream = spark.readStream.schema(schema).json(path)
rawStream.writeStream.format("snowflake").options(options) .option("dbtable", "L_FEATURE_TEST").option("checkpointLocation", checkpointRaw).trigger(Trigger.Once()).start()

Error:

java.lang.IllegalArgumentException: requirement failed: Private key must be specified in Snowflake streaming
	at scala.Predef$.require(Predef.scala:224)
	at net.snowflake.spark.snowflake.SnowflakeSink.<init>(SnowflakeSink.scala:41)
	at net.snowflake.spark.snowflake.DefaultSource.createSink(DefaultSource.scala:137)
	at org.apache.spark.sql.execution.datasources.DataSource.createSink(DataSource.scala:305)
	at org.apache.spark.sql.streaming.DataStreamWriter.start(DataStreamWriter.scala:330)
	at line92a282bca6f44a208d621b415f7ee12490.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2679090388403770:1)
	at line92a282bca6f44a208d621b415f7ee12490.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2679090388403770:64)
	at line92a282bca6f44a208d621b415f7ee12490.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2679090388403770:66)
	at line92a282bca6f44a208d621b415f7ee12490.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2679090388403770:68)
	at line92a282bca6f44a208d621b415f7ee12490.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2679090388403770:70)
	at line92a282bca6f44a208d621b415f7ee12490.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2679090388403770:72)
	at line92a282bca6f44a208d621b415f7ee12490.$read$$iw$$iw$$iw$$iw$$iw$$iw.<init>(command-2679090388403770:74)
	at line92a282bca6f44a208d621b415f7ee12490.$read$$iw$$iw$$iw$$iw$$iw.<init>(command-2679090388403770:76)
	at line92a282bca6f44a208d621b415f7ee12490.$read$$iw$$iw$$iw$$iw.<init>(command-2679090388403770:78)
	at line92a282bca6f44a208d621b415f7ee12490.$read$$iw$$iw$$iw.<init>(command-2679090388403770:80)
	at line92a282bca6f44a208d621b415f7ee12490.$read$$iw$$iw.<init>(command-2679090388403770:82)
	at line92a282bca6f44a208d621b415f7ee12490.$read$$iw.<init>(command-2679090388403770:84)
	at line92a282bca6f44a208d621b415f7ee12490.$read.<init>(command-2679090388403770:86)
	at line92a282bca6f44a208d621b415f7ee12490.$read$.<init>(command-2679090388403770:90)
	at line92a282bca6f44a208d621b415f7ee12490.$read$.<clinit>(command-2679090388403770)
	at line92a282bca6f44a208d621b415f7ee12490.$eval$.$print$lzycompute(<notebook>:7)

Not sure, This is issue. Is this possible to load streaming data using the username and password ?

satendrakumar avatar Oct 16 '19 17:10 satendrakumar

I am also having issues using when using Spark Structured Streaming. I noticed the error @satendrakumar was experiencing above so I modified my code to supply a private key via the privateKey option. It returned the following error:

ERROR IllegalArgumentException: "A snowflake passsword or private key path must be provided with 'sfpassword or pem_private_key' parameter, e.g. 'password'"

When trying to also include the pem_private_key option, I get the following exception despite me following code examples found in the Snowflake docs:

IllegalArgumentException: 'Input PEM private key is invalid'

andregoode avatar Nov 06 '19 23:11 andregoode

The streaming mode does not currently support streaming data directly from Databricks or Qubole. However, the connector still works in non-streaming mode with both Qubole and Databricks.

rkesh-singh avatar Nov 11 '19 07:11 rkesh-singh

The streaming mode does not currently support streaming data directly from Databricks or Qubole. However, the connector still works in non-streaming mode with both Qubole and Databricks.

To be clear @rkesh-singh, you are currently using the Spark-Snowflake connector for batch writes? I am as well... but looking to use the structured streaming SnowflakeSink published here for streaming jobs. No documentation exists 😩

andregoode avatar Nov 11 '19 19:11 andregoode

@andregoode Streaming support is still in preview. You can contact Snowflake for enabling in your account.

rkesh-singh avatar Nov 14 '19 04:11 rkesh-singh

@rkesh-singh Is there any update on this? Or is it still in preview mode?

dwai1714 avatar Apr 14 '20 12:04 dwai1714

I used below method, and it worked.

I have tried it with PySpark, should also work with modifications in Spark-Scala.

Pre-requisites: Public key must be added to user in Snowflake.

Additional Libraries imported:

from cryptography.hazmat.backends import default_backend
from cryptography.hazmat.primitives import serialization

Used following code to obtain decrypted Private key without header and trailer:

private_key_obj = open(private_key_path,"r")
private_key=private_key_obj.read()
private_key_obj.close()

key = bytes(private_key, 'utf-8')

p_key = serialization.load_pem_private_key(key, password=passphrase.encode(), backend=default_backend())

pkb = p_key.private_bytes(
    encoding=serialization.Encoding.PEM,
    format=serialization.PrivateFormat.PKCS8,
    encryption_algorithm=serialization.NoEncryption()
).replace(b"-----BEGIN PRIVATE KEY-----\n", b"") \
    .replace(b"\n-----END PRIVATE KEY-----", b"") \
        .decode("utf-8")

In options added 'pem_private_key' as 'pkb'.

Added some additional parameters in writeStream():

rawstream.writeStream\
    .outputMode("append")\
    .option("checkpointLocation", <checkpoint location>)\
    .option("dbtable",<target table name>)\
    .options(**options)\
    .option("streaming_stage", <temp stage name>)\
    .format("snowflake")\
    .start().awaitTermination()

Mr-Fantasy avatar Oct 21 '22 12:10 Mr-Fantasy

still un-support streaming read?

FightingLiu1324 avatar Oct 26 '23 08:10 FightingLiu1324