data-diff
data-diff copied to clipboard
Presto snowflake enhancement
Changes entail: Snowflake: Allow auth by PKCS8 key for Snowflake User private key specification
Presto: Allow verify http session with certificate for Presto Allow using basic authentication for Presto Presto Port specification Presto HTTP scheme specification Presto certificate specification
For Testing: Snowflake: Create a Snowflake free trial account here,
Run the following in your local machine via CLI to generate a new SSH key pair:
cd ~/.ssh
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out snowflake_rsa_key.p8 -nocrypt
openssl rsa -in snowflake_rsa_key.p8 -pubout -out snowflake_rsa_key.pub
cat snowflake_rsa_key.pub | sed '1d;$d' | tr -d "\n" | pbcopy
cat snowflake_rsa_key.p8 | tr '\n' '|' | sed 's/\|/\\n/g' | pbcopy
Then take what is on your Clipboard, and in Snowflake, associate it to a user you’re trying to log in as:
alter user <SNOWFLAKE_USER> set rsa_public_key='FILL_IN_VALUE_FROM_CLIPBOARD'
Presto:
- Run the following on your local via CLI
openssl genrsa -des3 -out example.com.key 2048
openssl req -new -key example.com.key -out example.com.csr #Provide 'localhost' as Common Name
cp example.com.key example.com.key.org
openssl rsa -in example.com.key.org -out example.com.key
Edit v3.ext file
vi v3.ext
Put the following in file v3.ext:
subjectKeyIdentifier = hash authorityKeyIdentifier = keyid:always,issuer:always basicConstraints = CA:TRUE keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment, keyAgreement, keyCertSign subjectAltName = DNS:localhost issuerAltName = issuer:copy
Run: openssl x509 -req -in example.com.csr -signkey example.com.key -out example.com.crt -days 3650 -sha256 -extfile v3.ext
-
Combine cert and private keys into a single file:
cat example.com.crt example.com.key > combined.pem
-
Put that combined.pem into dev/presto-conf/standalone/
-
Put the example.com.crt into the data-diff project root
-
Creat a new user password file with the following , providing test as the password as well
htpasswd -c password.db test
htpasswd -B -C 10 password.db test
-
Put that password.db under dev/presto-conf/standalone/
-
Create a new dev/presto-conf/standalone/password-authenticator.properties file with the following contents:
password-authenticator.name=file
file.password-file=/opt/presto/etc/password.db
- Add the following to dev/presto-conf/standalone/config.properties:
http-server.https.enabled=true
http-server.https.port=8443
http-server.https.keystore.path=/opt/presto/etc/combined.pem
http-server.authentication.type=PASSWORD
- Add this line to docker-compose.yml under ‘ports’ for ‘presto’
- '8443:8443'
After the above steps are completed, mount the data-diff directory in a docker container, load sample data and test using a config.toml
file like below:
[database.presto]
driver = "presto"
host = "<presto hostname>"
port = "<presto port>"
user = "<presto user>"
password = "<presto password>"
catalog = "<presto catalog>"
schema = "<presto schema>"
cert = "<path/to/presto/certificate.cer>" (this is the certificate from step 4 above)
auth = "basic"
http_scheme = "https"
source = "odbc"
[database.snowflake]
driver = "snowflake"
account = "snowflake account name"
database = "snowflake database name"
user = "<snowflake user name>"
warehouse = "<snowflake warehouse>"
role = "<snowflake role>"
schema = "<snowflake schema>"
key = "<path/to/snowflake/ssh/key.p8>" e.g. "/.ssh/snowflake_rsa_key.p8"
# Specify the default run params
[run.default]
verbose = true
# Specify params for a run 'test_diff'.
[run.test_diff]
verbose = false
# Source 1 ("left")
1.database = "presto"
1.table = "<table name>"
# Source 2 ("right")
2.database = "snowflake"
2.table = "<schema name>.<table name>" (case sensitive)
Then run data diff in the CLI using:
poetry run python3 -m data_diff --conf <path/to/config.toml/file> --run test_diff -v
E.g.
poetry run python3 -m data_diff --conf data_diff/config.toml --run test_diff -v
Hi @matthiasekundayo-eb , I tried to follow your instructions for setup on Snowflake, but got stuck. Any chance you can walk me through it live? On slack/discord/zoom/hangouts, or anywhere else.
Hi @erezsh Will you be available anytime later today? I am on EST timezone.
@erezsh Are you in Isreal? If yes, then let's do 11:30am EST on Thursday Aug 11th or 11:30am EST on Friday Aug 12th. Will any of those times work for you?
@matthiasekundayo-eb I'm in Amsterdam, but close enough. These dates/hours are fine. Where can I reach you?
@erezsh I just sent over a Google Meet invite to your inbox.
Here is also a copy of the same invite: Data-diff: Snowflake Setup Walk-Through Thursday, August 11 · 11:30am – 12:00pm EST Google Meet joining info Video call link: https://meet.google.com/xhq-kyzk-swk Or dial: (CA) +1 778-729-8580 PIN: 323 066 228# More phone numbers: https://tel.meet/xhq-kyzk-swk?pin=5417485608915
@matthiasekundayo-eb Okay, I tested Snowflake and it works with the key
argument. However, when I also provide a password, I get -
ImportError: Failed to import test module: tests.test_database_types
Traceback (most recent call last):
File "C:\Python310\lib\site-packages\cryptography\utils.py", line 38, in _check_byteslike
memoryview(value)
TypeError: memoryview: a bytes-like object is required, not 'str'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Python310\lib\unittest\loader.py", line 436, in _find_test_path
module = self._get_module_from_name(name)
File "C:\Python310\lib\unittest\loader.py", line 377, in _get_module_from_name
__import__(name)
File "C:\code\datafold\xdiff\tests\test_database_types.py", line 30, in <module>
CONNS = {k: db.connect.connect(v, N_THREADS) for k, v in CONN_STRINGS.items()}
File "C:\code\datafold\xdiff\tests\test_database_types.py", line 30, in <dictcomp>
CONNS = {k: db.connect.connect(v, N_THREADS) for k, v in CONN_STRINGS.items()}
File "C:\code\datafold\xdiff\data_diff\databases\connect.py", line 204, in connect
return connect_with_dict(db_conf, thread_count)
File "C:\code\datafold\xdiff\data_diff\databases\connect.py", line 173, in connect_with_dict
return cls(**d)
File "C:\code\datafold\xdiff\data_diff\databases\snowflake.py", line 44, in __init__
p_key = serialization.load_pem_private_key(
File "C:\Python310\lib\site-packages\cryptography\hazmat\primitives\serialization\base.py", line 22, in load_pem_private_key
return ossl.load_pem_private_key(data, password)
File "C:\Python310\lib\site-packages\cryptography\hazmat\backends\openssl\backend.py", line 823, in load_pem_private_key
return self._load_key(
File "C:\Python310\lib\site-packages\cryptography\hazmat\backends\openssl\backend.py", line 1042, in _load_key
utils._check_byteslike("password", password)
File "C:\Python310\lib\site-packages\cryptography\utils.py", line 40, in _check_byteslike
raise TypeError("{} must be bytes-like".format(name))
TypeError: password must be bytes-like
I think it should be able to accept string passwords, no?
@matthiasekundayo-eb For Presto, I'm getting this error:
File "C:\Python310\lib\site-packages\parameterized\parameterized.py", line 533, in standalone_func
return func(*(a + p.args), **p.kwargs)
File "C:\code\datafold\xdiff\tests\test_database_types.py", line 569, in test_types
_drop_table_if_exists(dst_conn, dst_table)
File "C:\code\datafold\xdiff\tests\common.py", line 98, in _drop_table_if_exists
conn.query(f"DROP TABLE IF EXISTS {table}", None)
File "C:\code\datafold\xdiff\data_diff\databases\base.py", line 101, in query
res = self._query(sql_code)
File "C:\code\datafold\xdiff\data_diff\databases\presto.py", line 65, in _query
c = self._conn.cursor()
File "C:\Users\erez\AppData\Roaming\Python\Python310\site-packages\prestodb\dbapi.py", line 166, in cursor
request = self._create_request()
File "C:\Users\erez\AppData\Roaming\Python\Python310\site-packages\prestodb\dbapi.py", line 141, in _create_request
return prestodb.client.PrestoRequest(
File "C:\Users\erez\AppData\Roaming\Python\Python310\site-packages\prestodb\client.py", line 256, in __init__
raise ValueError("cannot use authentication with HTTP")
ValueError: cannot use authentication with HTTP
Hi @erezsh, For Snowflake, you can't use password and key at the same time to connect. You can either use basic auth (username & password) or use username & ssh.
For Presto, did you use http_scheme = "https"
?
Hi @matthiasekundayo-eb , when I add http_scheme=https
I get the following error in Python:
requests.exceptions.SSLError: HTTPSConnectionPool(host='127.0.0.1', port=8080): Max retries exceeded with url: /v1/statement (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)')))
On the server side, I'm getting this error repeatedly:
presto_1 | 2022-08-23T09:51:55.588Z WARN http-worker-137 org.eclipse.jetty.util.SharedBlockingCallback Blocker not complete Blocker@396571e4{null}
presto_1 | 2022-08-23T09:51:55.588Z WARN http-worker-137 org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
presto_1 | java.lang.NullPointerException
presto_1 | at io.airlift.http.server.HttpServerChannelListener.onRequestEnd(HttpServerChannelListener.java:62)
presto_1 | at io.airlift.http.server.HttpServerChannelListener.onResponseBegin(HttpServerChannelListener.java:70)
presto_1 | at org.eclipse.jetty.server.HttpChannelListeners.onResponseBegin(HttpChannelListeners.java:187)
presto_1 | at org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:814)
presto_1 | at org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:842)
presto_1 | at org.eclipse.jetty.server.HttpChannel.onBadMessage(HttpChannel.java:775)
presto_1 | at org.eclipse.jetty.server.HttpChannelOverHttp.badMessage(HttpChannelOverHttp.java:283)
presto_1 | at org.eclipse.jetty.http.HttpParser.badMessage(HttpParser.java:1628)
presto_1 | at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1610)
presto_1 | at org.eclipse.jetty.server.HttpConnection.parseRequestBuffer(HttpConnection.java:364)
presto_1 | at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:261)
presto_1 | at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
presto_1 | at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
presto_1 | at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
presto_1 | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
presto_1 | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
presto_1 | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
presto_1 | at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
presto_1 | at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
presto_1 | at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
presto_1 | at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
presto_1 | at java.base/java.lang.Thread.run(Thread.java:829)
Any idea what's causing it? It only happens for https, it seems.
As for snowflake, no problem, I can fix the error myself.
Hi @erezsh Try using port = "8443"
in the config.toml
file instead of 8080.
Could you please also share your config file with the password redacted?
Hi @matthiasekundayo-eb , the port helped, thanks! Now I'm getting prestodb.exceptions.HttpError: error 401: b'Access Denied: Invalid credentials'
I don't really know why, but tomorrow I'll try to redo the process as you described it, from scratch, just in case I made a mistake on the way.
@matthiasekundayo-eb I got it working. I think maybe I missed putting 'localhost' as Common Name. Either way, I think it's ready to merge. I will try to do so later today.
Merged. Thank you for your contribution, and sorry it took a while!
Thanks @erezsh