delta-rs
delta-rs copied to clipboard
Simple delta write in Fabric notebook failing with SSL error
Delta-rs version: Python 0.12.0 Cloud provider: Microsoft (UK South) Environment: Fabric Notebook
Bug
What happened: When trying to write a pandas dataframe to a delta table in Microsoft Fabric it fails with an SSL error:
OSError: Generic MicrosoftAzure error: response error "request error", after 10 retries: error sending request for url (https://onelake.blob.fabric.microsoft.com/xxx/yyy.Lakehouse/Tables/Test/_delta_log/_last_checkpoint): error trying to connect: error:0A000086:SSL routines:tls_post_process_server_certificate:certificate verify failed:ssl/statem/statem_clnt.c:1889: (self-signed certificate)
How to reproduce it: Installed deltalake in fabric library management and then ran the following in a notebook:
import pandas as pd
from deltalake.writer import write_deltalake
from trident_token_library_wrapper import PyTridentTokenLibrary
token = PyTridentTokenLibrary.get_access_token("storage")
TablePath = "abfss://[email protected]/yyy.Lakehouse/Tables/Test"
aadToken = PyTridentTokenLibrary.get_access_token("storage")
df = pd.DataFrame({"id": [1, 2], "value": ["foo", "boo"]})
write_deltalake(TablePath, df, storage_options={"bearer_token": aadToken, "use_fabric_endpoint": "true"})
@bcdobbs is there any chance that in the environment where the notebook is running, traffic goes through some appliance with SSL inspection (I don't know much about Fabric notebooks)? https://onelake.blob.fabric.microsoft.com/ has a valid SSL certificate but the error says it has a self-signed one which may happen if SSL inspection is used.
btw, an easy way to answer my question would be running something like this in the notebook.
import requests
requests.get('https://onelake.blob.fabric.microsoft.com/').content
If you get Healthy then my earlier theory is wrong, but if you get an error - then it holds.
I was trying to replicate the issue on my side but I am getting a different error when I try to from deltalake.writer import write_deltalake
Error: /home/trusted-service-user/cluster-env/trident_env/lib/python3.10/site-packages/pyarrow/libarrow_acero.so.1200: undefined symbol: _ZN5arrow7compute4callESsSt6vectorINS0_10ExpressionESaIS2_EESt10shared_ptrINS0_15FunctionOptionsEE
Thanks @r3stl355, I'd assumed that there was some redirect going on but your test returned Healthy.
With regard to your error how did you make deltalake library available? I'd used the workspace library management GUI from workspace settings (https://learn.microsoft.com/en-us/fabric/data-science/python-guide/python-library-management), not sure if you can install them at a notebook level; still learning myself!
@r3stl355 based on your suggestion I tried running:
import requests
aadToken = PyTridentTokenLibrary.get_access_token("storage")
headersAuth = {
"Authorization": f"Bearer {aadToken}"
}
output = requests.get("https://onelake.blob.fabric.microsoft.com/xxx/yyy.Lakehouse/Tables", headers=headersAuth)
I get a 200 status code which suggests it's authenticating. (If I remove the auth header it tells me there is an authentication issue.)
OK @bcdobbs, ignore everything I wrote before 😁 , this looks like a problem with the writer because Spark writer works and so does the direct API call (i.e. I can create a file under Files with a PUT. I'll carry on digging
as for the other error I had - I installed deltalake with pip but then figured it doesn't work with the pyarrow in the cluster. Can you please check which version of pyarrow you are running, i.e. pip list
lastly - this is not just a writer but also a reader problem, I get the same error if I do DeltaTable("abfss://<ws-id>@onelake.dfs.fabric.microsoft.com/<lh-id>/Tables/test", storage_options={"bearer_token": aadToken, "use_fabric_endpoint": "true"})
pyarrow is 12.0.0.
hmm, strange, I had to lower the pyarrow version to avoid that other error I was getting. Actually, just re-installing v12.0.0 also works - maybe it comes with some incomplete install. Anyways, that folder _delta_log/_last_checkpoint in the error does not actually exist, I wonder if that could be a cause of the problem (resulting in incorrect message perhaps 🤷 )
there is an issue with pyarrow https://github.com/delta-io/delta-rs/pull/1743
hmm, ok, tried with deltalake 0.13, and same erros, I think the regression was introduced in Fabric 1.2 runtime, for now better use runtime 1.1 where it works fine.
Thanks @r3stl355 and @djouallah, really appreciate your time. Indeed reverting the Fabric runtime let's it work fine! Really excited as work for a group of schools so data volumes aren't huge and always looking for ways to keep compute costs low.
Much appreciated
Ben
I think I got to the bottom of this. Issue is likely related to the way ADLS access is configured in Azure Fabric - though onelake.blob.fabric.microsoft.com resolves to a public IP in the notebook, there is an entry in /ets/hosts pointing to a loopback IP 127.0.0.2 which uses a self-signed certificate. The same code that fails in Fabrick notebook works in other places (I tried on a local Mac and Azure Web Terminal using the token issued in Fabrick notebook) so this is unlikely a Delta RS problem, more like for Microsoft to solve.
Some extra supporting/interesting data:
-
Azure Web Terminal actually uses the same OS as the Fabric notebook:
NAME="Common Base Linux Mariner, VERSION="2.0.20231004" -
curlin the Fabrick server works but shows that the connection is to a loopback IP 127.0.0.2 which means it may be using a self-signed certificate. (I have a good table there namedbad)
> !curl -H "Authorization: Bearer $TOKEN" https://onelake.blob.fabric.microsoft.com/.../.../Tables/bad -verbose
Connected to onelake.blob.fabric.microsoft.com (127.0.0.2) port 443
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* CAfile: /etc/pki/tls/certs/ca-bundle.trust.crt
* CApath: /etc/pki/ca-trust/extracted/openssl
...
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN: server accepted h2
* Server certificate:
* subject: C=US; ST=Washington; L=Redmond; O=MicrosoftData; OU=SparkDepartment; [email protected]; CN=microsoft.com
* start date: Nov 6 09:14:13 2023 GMT
* expire date: Nov 5 09:14:13 2024 GMT
....
* SSL certificate verify ok.
- The same command in Web Terminal shows that it's using a public IP this time and different certificate (e.g. different validity dates so)
* Connected to onelake.blob.fabric.microsoft.com (20.50.0.27) port 443
* ALPN: curl offers h2,http/1.1
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* CAfile: /etc/pki/tls/certs/ca-bundle.trust.crt
* CApath: /etc/ssl/certs
...
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN: server accepted h2
* Server certificate:
* subject: C=US; ST=WA; L=Redmond; O=Microsoft Corporation; CN=westeurope.onelake.fabric.microsoft.com
* start date: Oct 7 14:27:33 2023 GMT
* expire date: Apr 4 14:27:33 2024 GMT
...
* SSL certificate verify ok.
- Open SSL cert verification confirmes the earlier theory about self-signed cert:
- In the Fabrick notebook
> !openssl s_client -connect onelake.blob.fabric.microsoft.com:443 -showcerts
CONNECTED(00000003)
depth=0 C = US, ST = Washington, L = Redmond, O = MicrosoftData, OU = SparkDepartment, emailAddress = [email protected], CN = microsoft.com
verify error:num=18:self signed certificate
...
---
SSL handshake has read 1867 bytes and written 407 bytes
Verification error: self signed certificate
---
- In Web Terminal
depth=1 C = US, O = Microsoft Corporation, CN = Microsoft Azure TLS Issuing CA 06
verify return:1
depth=0 C = US, ST = WA, L = Redmond, O = Microsoft Corporation, CN = westeurope.onelake.fabric.microsoft.com
verify return:1
...
---
SSL handshake has read 4564 bytes and written 799 bytes
Verification: OK
---
- Using
curlwith IPs - works in the Fabrick notebook if I use a loopback IP (e.g.curl https://127.0.0.2/...) but fails with certificate error if I use a public IP returned bynslookup(e.g.curl https://40.82.254.113/...). Using IP in Azure Web Terminal does not work as expected
A shorter version of the answer - curl in Fabric runtime 1.1 seems to be using a different CA file (/etc/ssl/certs/ca-certificates.crt) than on 1.2 CA file (/etc/pki/tls/certs/ca-bundle.trust.crt), which also has an extra suffix attached to certificate value. /etc/ssl/certs/ca-certificates.crt file is still there on Runtime 1.2 but it does not contain the certificate used by the endpoint.
Maybe openssl is trying to use the /etc/ssl/certs/ca-certificates.crt, or maybe it is unable to properly find the cert in /etc/pki/tls/certs/ca-bundle.trust.crt because of that extra suffix
and lastly, here is a really ugly solution if you are still keen on trying runtime 1.2.
- Run
!openssl s_client -connect onelake.blob.fabric.microsoft.com:443to get the certificate. - Copy the certificate value between -----BEGIN CERTIFICATE----- and -----END CERTIFICATE----- and write it out to a local file, e.g.
cert = """-----BEGIN CERTIFICATE-----
MIIFGzCCBAOgAwIBAgIUFO5FzvkmKVoyIlO8gQM8vkcNJ0kwDQYJKoZIhvcNAQEL
BQAwgZ8xCzAJBgNVBAYTAlVTMRMwEQYDVQQIDApXYXNoaW5ndG9uMRAwDgYDVQQH
<rest of the cert value here, shortened for brevity
-----END CERTIFICATE-----
"""
with open("ca.cert", "w") as out:
out.write(cert)
- Export the created file name into ENV var and things should work, e.g.
os.environ["SSL_CERT_FILE"] = "./ca.cert"
workspace_id = <your workspace id here>
lakehouse_id = <your lakehouse id here>
dt = DeltaTable(f"abfss://{workspace_id}@onelake.dfs.fabric.microsoft.com/{lakehouse_id}/Tables/bad", storage_options={"bearer_token": aadToken, "use_fabric_endpoint": "true"})
print(dt.version())
With this you may actually consider closing this ticket, not the place to be resolved imo
If you have the option, try to reach out to Microsoft fabric product team directly to flag the regression
Thanks all, will reach out to Microsoft.
please try:
os.environ["SSL_CERT_DIR"] = "/etc/pki/ca-trust/extracted/openssl:/opt/olcclient"
Microsoft fabric onelake team is fixing it.
Maybe it's good to close this, since the issue is caused by Fabric.
@RobinLin666 Could you give a link to the bug report with the MS Fabric OneLake team that I could follow, regarding the self-signed certificate problem, please? Or is it all just back channels?
re:
os.environ["SSL_CERT_DIR"] = "/etc/pki/ca-trust/extracted/openssl:/opt/olcclient"
This does not seem to work. I'm currently using a variation of the ugly solution suggested by @r3stl355
if not os.path.exists("onelake_cert.crt"):
os.system("openssl s_client -showcerts -connect onelake.blob.fabric.microsoft.com:443 | awk '/BEGIN CERTIFICATE/,/END CERTIFICATE/' >> onelake_cert.crt")
os.environ["SSL_CERT_FILE"] = "./onelake_cert.crt"
Hopefully MS will come through with a solution soon. Along with the other delta table write issue, deltalake
and polars currently have a severely limited usability in the Fabric environment, which is a pity since I love both.