ipex-llm icon indicating copy to clipboard operation
ipex-llm copied to clipboard

[ppml] cannot resolve '`name`' given input columns KMS encrypted data

Open Le-Zheng opened this issue 1 year ago • 6 comments

Error of running SimpleQuery example with bigdl-ppml-spark_3.1.2-2.1.0-20220907.120744-222-jar-with-dependencies.jar

Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`name`' given input columns: [?b??cI?-??(?e??@??U0??Sw?:5$E?p'??Y??>??       , ???mU?1?'???u?Y?_?;&???#<?w?????1Gn??q;%?+???5??;?];
'Project ['name]
+- Relation[?b??cI?-??(?e??@??U0??Sw?:5$E?p'??Y??>??    #16,???mU?1?'???u?Y?_?;&???#<?w?????1Gn??q;%?+???5??;?#17] csv

MicrosoftTeams-image

Le-Zheng avatar Sep 09 '22 01:09 Le-Zheng

Sample spark submit :

/opt/spark/bin/spark-submit \
--master ${RUNTIME_SPARK_MASTER} \
--deploy-mode cluster \
--name simplequery \
--conf spark.driver.memory=20g \
--conf spark.executor.cores=16 \
--conf spark.executor.memory=20g \
--conf spark.executor.instances=1 \
--conf spark.cores.max=16 \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf spark.kubernetes.container.image=${RUNTIME_K8S_SPARK_IMAGE} \
--conf spark.kubernetes.executor.deleteOnTermination=false \
--conf spark.network.timeout=10000000 \
--conf spark.executor.heartbeatInterval=10000000 \
--conf spark.python.use.daemon=false \
--conf spark.python.worker.reuse=false \
--conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \
--conf spark.kubernetes.driver.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \
--conf spark.kubernetes.driver.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/bigdl2.0/data \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.options.claimName=nfsvolumeclaim \
--conf spark.kubernetes.executor.volumes.persistentVolumeClaim.nfsvolumeclaim.mount.path=/bigdl2.0/data \
--conf spark.authenticate=true \
--conf spark.authenticate.secret=intel@123 \
--conf spark.kubernetes.executor.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
--conf spark.kubernetes.driver.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
--conf spark.authenticate.enableSaslEncryption=true \
--conf spark.network.crypto.enabled=true --conf spark.network.crypto.keyLength=128 \
--conf spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA1 \
--conf spark.io.encryption.enabled=true \
--conf spark.io.encryption.keySizeBits=128 \
--conf spark.io.encryption.keygen.algorithm=HmacSHA1 \
--conf spark.ssl.enabled=true \
--conf spark.ssl.port=8043 \
--conf spark.ssl.keyPassword=$secure_password \
--conf spark.ssl.keyStore=/bigdl2.0/data/keystore.jks \
--conf spark.ssl.keyStorePassword=$secure_password \
--conf spark.ssl.keyStoreType=JKS \
--conf spark.ssl.trustStore=/bigdl2.0/data/keystore.jks \
--conf spark.ssl.trustStorePassword=intel@123 \
--conf spark.ssl.trustStoreType=JKS \
--class com.intel.analytics.bigdl.ppml.examples.SimpleQuerySparkExample \
--conf spark.driver.extraClassPath=local:///bigdl2.0/data/ppml/bigdl-ppml-spark_3.1.2-2.1.0-20220612.193825-116-jar-with-dependencies.jar \
--conf spark.executor.extraClassPath=local:///bigdl2.0/data/ppml/bigdl-ppml-spark_3.1.2-2.1.0-20220612.193825-116-jar-with-dependencies.jar \
--jars local:///bigdl2.0/data/ppml/bigdl-ppml-spark_3.1.2-2.1.0-20220612.193825-116-jar-with-dependencies.jar \
local:///bigdl2.0/data/ppml/bigdl-ppml-spark_3.1.2-2.1.0-20220612.193825-116-jar-with-dependencies.jar \
--inputPath /bigdl2.0/data/ppml/people/encrypted \
--outputPath /bigdl2.0/data/ppml/people/people_encrypted_output \
--inputPartitionNum 16 \
--outputPartitionNum 16 \
--inputEncryptModeValue AES/CBC/PKCS5Padding \
--outputEncryptModeValue AES/CBC/PKCS5Padding \
--primaryKeyPath /bigdl2.0/data/ppml/20line_data_keys/primaryKey \
--dataKeyPath /bigdl2.0/data/ppml/20line_data_keys/dataKey \
--kmsType SimpleKeyManagementService \
--simpleAPPID 165172133285

When we replace bigdl-ppml-spark_3.1.2-2.1.0-20220612.193825-116-jar-with-dependencies.jar with bigdl-ppml-spark_3.1.2-2.1.0-20220907.120744-222-jar-with-dependencies.jar, the above issue occurs.

Le-Zheng avatar Sep 09 '22 01:09 Le-Zheng

I just got the same error. image here is my script:

rm -rf /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/people_encrypted_output && \
export mode=client && \
secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin` && \
export TF_MKL_ALLOC_MAX_BYTES=10737418240 && \
export SPARK_LOCAL_IP=$LOCAL_IP && \
./clean.sh
gramine-argv-serializer bash -c "/opt/jdk8/bin/java \
  -cp '/ppml/trusted-big-data-ml/work/data/shansimu/ppml-e2e-examples/spark-encrypt-io/target/spark-encrypt-io-0.3.0-SNAPSHOT.jar:/ppml/trusted-big-data-ml/work/spark-3.1.2/examples/jars/scopt_2.12-3.7.1.jar:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/*:/ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/jars/*' \
    -Xmx8g \
    org.apache.spark.deploy.SparkSubmit \
    --master $RUNTIME_SPARK_MASTER \
    --deploy-mode cluster \
    --name spark-simplequery-sgx \
    --conf spark.driver.host=$LOCAL_IP \
    --conf spark.driver.port=54321 \
    --conf spark.driver.memory=32g \
    --conf spark.executor.cores=8 \
    --conf spark.executor.memory=32g \
    --conf spark.executor.instances=2 \
    --conf spark.cores.max=32 \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \
    --conf spark.kubernetes.driver.podTemplateFile=/ppml/trusted-big-data-ml/spark-driver-template.yaml \
    --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \
    --conf spark.kubernetes.executor.deleteOnTermination=false \
    --conf spark.network.timeout=10000000 \
    --conf spark.executor.heartbeatInterval=10000000 \
    --conf spark.python.use.daemon=false \
    --conf spark.python.worker.reuse=false \
    --conf spark.kubernetes.sgx.enabled=true \
    --conf spark.kubernetes.sgx.driver.mem=64g \
    --conf spark.kubernetes.sgx.driver.jvm.mem=12g \
    --conf spark.kubernetes.sgx.executor.mem=64g \
    --conf spark.kubernetes.sgx.executor.jvm.mem=12g \
    --conf spark.kubernetes.sgx.log.level=error \
    --conf spark.authenticate=true \
    --conf spark.authenticate.secret=$secure_password \
    --conf spark.kubernetes.executor.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
    --conf spark.kubernetes.driver.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
    --conf spark.authenticate.enableSaslEncryption=true \
    --conf spark.network.crypto.enabled=true \
    --conf spark.network.crypto.keyLength=128 \
    --conf spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA1 \
    --conf spark.io.encryption.enabled=true \
    --conf spark.io.encryption.keySizeBits=128 \
    --conf spark.io.encryption.keygen.algorithm=HmacSHA1 \
    --conf spark.ssl.enabled=true \
    --conf spark.ssl.port=8043 \
    --conf spark.ssl.keyPassword=$secure_password \
    --conf spark.ssl.keyStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
    --conf spark.ssl.keyStorePassword=$secure_password \
    --conf spark.ssl.keyStoreType=JKS \
    --conf spark.ssl.trustStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
    --conf spark.ssl.trustStorePassword=$secure_password \
    --conf spark.ssl.trustStoreType=JKS \
    --conf spark.driver.extraClassPath=/ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/jars/*:/ppml/trusted-big-data-ml/work/spark-3.1.2/examples/jars/* \
    --conf spark.executor.extraClassPath=/ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/jars/*:/ppml/trusted-big-data-ml/work/spark-3.1.2/examples/jars/* \
    --class com.intel.analytics.bigdl.ppml.examples.SimpleQuerySparkExample \
    --verbose \
    --jars local:///ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/jars/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT.jar \
    local:///ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/jars/bigdl-ppml-spark_3.1.2-2.1.0-SNAPSHOT.jar \
    --inputPath /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/people_encrypted \
    --outputPath /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/people_encrypted_output \
    --inputPartitionNum 8 \
    --outputPartitionNum 8 \
    --inputEncryptModeValue AES/CBC/PKCS5Padding \
    --outputEncryptModeValue AES/CBC/PKCS5Padding \
    --primaryKeyPath /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/keys/primaryKey \
    --dataKeyPath /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/keys/dataKey \
    --kmsType SimpleKeyManagementService \
    --simpleAPPID 947536384638 \
    --simpleAPPKEY 884926981201" > /ppml/trusted-big-data-ml/secured_argvs
./init.sh
gramine-sgx bash 2>&1 | tee query-client-simple.log

ShanSimu avatar Sep 09 '22 02:09 ShanSimu

This may be due to the jar package path. Here is my early scrpit without this error:

rm -rf /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/people_encrypted_output && \

export mode=client && \
secure_password=`openssl rsautl -inkey /ppml/trusted-big-data-ml/work/password/key.txt -decrypt </ppml/trusted-big-data-ml/work/password/output.bin` && \
export TF_MKL_ALLOC_MAX_BYTES=10737418240 && \
export SPARK_LOCAL_IP=$LOCAL_IP && \
./clean.sh
gramine-argv-serializer bash -c "/opt/jdk8/bin/java \
  -cp '/ppml/trusted-big-data-ml/work/data/shansimu/ppml-e2e-examples/spark-encrypt-io/target/spark-encrypt-io-0.3.0-SNAPSHOT.jar:/ppml/trusted-big-data-ml/work/spark-3.1.2/examples/jars/scopt_2.12-3.7.1.jar:/ppml/trusted-big-data-ml/work/spark-3.1.2/conf/:/ppml/trusted-big-data-ml/work/spark-3.1.2/jars/*:/ppml/trusted-big-data-ml/work/bigdl-2.1.0-SNAPSHOT/jars/*' \
    -Xmx8g \
    org.apache.spark.deploy.SparkSubmit \
    --master $RUNTIME_SPARK_MASTER \
    --deploy-mode client \
    --name spark-simplequery-sgx \
    --conf spark.driver.host=$LOCAL_IP \
    --conf spark.driver.port=54321 \
    --conf spark.driver.memory=32g \
    --conf spark.executor.cores=8 \
    --conf spark.executor.memory=32g \
    --conf spark.executor.instances=2 \
    --conf spark.cores.max=32 \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf spark.kubernetes.container.image=$RUNTIME_K8S_SPARK_IMAGE \
    --conf spark.kubernetes.driver.podTemplateFile=/ppml/trusted-big-data-ml/spark-driver-template.yaml \
    --conf spark.kubernetes.executor.podTemplateFile=/ppml/trusted-big-data-ml/spark-executor-template.yaml \
    --conf spark.kubernetes.executor.deleteOnTermination=false \
    --conf spark.network.timeout=10000000 \
    --conf spark.executor.heartbeatInterval=10000000 \
    --conf spark.python.use.daemon=false \
    --conf spark.python.worker.reuse=false \
    --conf spark.kubernetes.sgx.enabled=true \
    --conf spark.kubernetes.sgx.executor.mem=64g \
    --conf spark.kubernetes.sgx.executor.jvm.mem=12g \
    --conf spark.kubernetes.sgx.log.level=error \
    --conf spark.authenticate=true \
    --conf spark.authenticate.secret=$secure_password \
    --conf spark.kubernetes.executor.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
    --conf spark.kubernetes.driver.secretKeyRef.SPARK_AUTHENTICATE_SECRET="spark-secret:secret" \
    --conf spark.authenticate.enableSaslEncryption=true \
    --conf spark.network.crypto.enabled=true \
    --conf spark.network.crypto.keyLength=128 \
    --conf spark.network.crypto.keyFactoryAlgorithm=PBKDF2WithHmacSHA1 \
    --conf spark.io.encryption.enabled=true \
    --conf spark.io.encryption.keySizeBits=128 \
    --conf spark.io.encryption.keygen.algorithm=HmacSHA1 \
    --conf spark.ssl.enabled=true \
    --conf spark.ssl.port=8043 \
    --conf spark.ssl.keyPassword=$secure_password \
    --conf spark.ssl.keyStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
    --conf spark.ssl.keyStorePassword=$secure_password \
    --conf spark.ssl.keyStoreType=JKS \
    --conf spark.ssl.trustStore=/ppml/trusted-big-data-ml/work/keys/keystore.jks \
    --conf spark.ssl.trustStorePassword=$secure_password \
    --conf spark.ssl.trustStoreType=JKS \
    --class com.intel.analytics.bigdl.ppml.examples.SimpleQuerySparkExample \
    --verbose \
    --jars local:///ppml/trusted-big-data-ml/work/data/shansimu/ppml-e2e-examples/spark-encrypt-io/target/spark-encrypt-io-0.3.0-SNAPSHOT.jar \
    local:///ppml/trusted-big-data-ml/work/data/shansimu/ppml-e2e-examples/spark-encrypt-io/target/spark-encrypt-io-0.3.0-SNAPSHOT.jar \
    --inputPath /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/people_encrypted \
    --outputPath /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/people_encrypted_output \
    --inputPartitionNum 8 \
    --outputPartitionNum 8 \
    --inputEncryptModeValue AES/CBC/PKCS5Padding \
    --outputEncryptModeValue AES/CBC/PKCS5Padding \
    --primaryKeyPath /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/keys/primaryKey \
    --dataKeyPath /ppml/trusted-big-data-ml/work/data/shansimu/simplequery/keys/dataKey \
    --kmsType SimpleKeyManagementService \
    --simpleAPPID 947536384638 \
    --simpleAPPKEY 884926981201" > /ppml/trusted-big-data-ml/secured_argvs
./init.sh
gramine-sgx bash 2>&1 | tee spark-simplequery-sgx-driver-on-sgx.log

ShanSimu avatar Sep 09 '22 02:09 ShanSimu

It seems like the encrypted file and the encrypted keys are not match, please try to generate a new encrypted file with your current primaryKey and dataKey.

PatrickkZ avatar Sep 09 '22 06:09 PatrickkZ

thx @PatrickkZ It turned out to be a problem with my script

ShanSimu avatar Sep 09 '22 08:09 ShanSimu

This error is because the encrypted file do not get decrypted, now, the encrypted file name has to end with .cbc, this extension file name will trigger the decrypt process, so try to change your encrypted file name, Example, people.csv to people.csv.cbc. But the required input file name in SimpleQuerySparkExample is fixed to people.csv, so you also need to modify SimpleQuerySparkExample' s code.

PatrickkZ avatar Sep 16 '22 05:09 PatrickkZ