data-diff icon indicating copy to clipboard operation
data-diff copied to clipboard

Presto snowflake enhancement

Open matthiasekundayo-eb opened this issue 1 year ago • 5 comments

Changes entail: Snowflake: Allow auth by PKCS8 key for Snowflake User private key specification

Presto: Allow verify http session with certificate for Presto Allow using basic authentication for Presto Presto Port specification Presto HTTP scheme specification Presto certificate specification

For Testing: Snowflake: Create a Snowflake free trial account here,

Run the following in your local machine via CLI to generate a new SSH key pair:

cd ~/.ssh
openssl genrsa 2048 | openssl pkcs8 -topk8 -inform PEM -out snowflake_rsa_key.p8 -nocrypt
openssl rsa -in snowflake_rsa_key.p8  -pubout -out snowflake_rsa_key.pub
cat snowflake_rsa_key.pub | sed '1d;$d' | tr -d "\n" | pbcopy
cat snowflake_rsa_key.p8 | tr '\n' '|' | sed 's/\|/\\n/g' | pbcopy

Then take what is on your Clipboard, and in Snowflake, associate it to a user you’re trying to log in as: alter user <SNOWFLAKE_USER> set rsa_public_key='FILL_IN_VALUE_FROM_CLIPBOARD'

Presto:

  1. Run the following on your local via CLI
openssl genrsa -des3 -out example.com.key 2048
openssl req -new -key example.com.key -out example.com.csr  #Provide 'localhost' as Common Name
cp example.com.key example.com.key.org
openssl rsa -in example.com.key.org -out example.com.key

Edit v3.ext file vi v3.ext

Put the following in file v3.ext:

subjectKeyIdentifier = hash authorityKeyIdentifier = keyid:always,issuer:always basicConstraints = CA:TRUE keyUsage = digitalSignature, nonRepudiation, keyEncipherment, dataEncipherment, keyAgreement, keyCertSign subjectAltName = DNS:localhost issuerAltName = issuer:copy

Run: openssl x509 -req -in example.com.csr -signkey example.com.key -out example.com.crt -days 3650 -sha256 -extfile v3.ext

  1. Combine cert and private keys into a single file: cat example.com.crt example.com.key > combined.pem

  2. Put that combined.pem into dev/presto-conf/standalone/

  3. Put the example.com.crt into the data-diff project root

  4. Creat a new user password file with the following , providing test as the password as well

htpasswd -c password.db test
htpasswd -B -C 10 password.db test
  1. Put that password.db under dev/presto-conf/standalone/

  2. Create a new dev/presto-conf/standalone/password-authenticator.properties file with the following contents:

password-authenticator.name=file
file.password-file=/opt/presto/etc/password.db
  1. Add the following to dev/presto-conf/standalone/config.properties:
http-server.https.enabled=true
http-server.https.port=8443
http-server.https.keystore.path=/opt/presto/etc/combined.pem
http-server.authentication.type=PASSWORD
  1. Add this line to docker-compose.yml under ‘ports’ for ‘presto’ - '8443:8443'

After the above steps are completed, mount the data-diff directory in a docker container, load sample data and test using a config.toml file like below:

[database.presto]
driver = "presto"
host = "<presto hostname>"
port = "<presto port>"
user = "<presto user>"
password = "<presto password>"
catalog = "<presto catalog>"
schema = "<presto schema>"
cert = "<path/to/presto/certificate.cer>" (this is the certificate from step 4 above)
auth = "basic"
http_scheme = "https"
source = "odbc"
​
[database.snowflake]
driver = "snowflake"
account = "snowflake account name"
database = "snowflake database name"
user = "<snowflake user name>"
warehouse = "<snowflake warehouse>"
role = "<snowflake role>"
schema = "<snowflake schema>"
key = "<path/to/snowflake/ssh/key.p8>" e.g. "/.ssh/snowflake_rsa_key.p8"
​
​
# Specify the default run params
[run.default]
verbose = true
​
# Specify params for a run 'test_diff'.
[run.test_diff]
verbose = false
# Source 1 ("left")
1.database = "presto"
1.table = "<table name>"
​
# Source 2 ("right")
2.database = "snowflake"
2.table = "<schema name>.<table name>" (case sensitive)

Then run data diff in the CLI using: poetry run python3 -m data_diff --conf <path/to/config.toml/file> --run test_diff -v E.g. poetry run python3 -m data_diff --conf data_diff/config.toml --run test_diff -v

matthiasekundayo-eb avatar Jul 29 '22 15:07 matthiasekundayo-eb

Hi @matthiasekundayo-eb , I tried to follow your instructions for setup on Snowflake, but got stuck. Any chance you can walk me through it live? On slack/discord/zoom/hangouts, or anywhere else.

erezsh avatar Aug 09 '22 15:08 erezsh

Hi @erezsh Will you be available anytime later today? I am on EST timezone.

matthiasekundayo-eb avatar Aug 09 '22 16:08 matthiasekundayo-eb

@erezsh Are you in Isreal? If yes, then let's do 11:30am EST on Thursday Aug 11th or 11:30am EST on Friday Aug 12th. Will any of those times work for you?

matthiasekundayo-eb avatar Aug 09 '22 16:08 matthiasekundayo-eb

@matthiasekundayo-eb I'm in Amsterdam, but close enough. These dates/hours are fine. Where can I reach you?

erezsh avatar Aug 09 '22 16:08 erezsh

@erezsh I just sent over a Google Meet invite to your inbox.

Here is also a copy of the same invite: Data-diff: Snowflake Setup Walk-Through Thursday, August 11 · 11:30am – 12:00pm EST Google Meet joining info Video call link: https://meet.google.com/xhq-kyzk-swk Or dial: ‪(CA) +1 778-729-8580‬ PIN: ‪323 066 228‬# More phone numbers: https://tel.meet/xhq-kyzk-swk?pin=5417485608915

matthiasekundayo-eb avatar Aug 09 '22 17:08 matthiasekundayo-eb

@matthiasekundayo-eb Okay, I tested Snowflake and it works with the key argument. However, when I also provide a password, I get -

ImportError: Failed to import test module: tests.test_database_types
Traceback (most recent call last):
  File "C:\Python310\lib\site-packages\cryptography\utils.py", line 38, in _check_byteslike
    memoryview(value)
TypeError: memoryview: a bytes-like object is required, not 'str'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Python310\lib\unittest\loader.py", line 436, in _find_test_path
    module = self._get_module_from_name(name)
  File "C:\Python310\lib\unittest\loader.py", line 377, in _get_module_from_name
    __import__(name)
  File "C:\code\datafold\xdiff\tests\test_database_types.py", line 30, in <module>
    CONNS = {k: db.connect.connect(v, N_THREADS) for k, v in CONN_STRINGS.items()}
  File "C:\code\datafold\xdiff\tests\test_database_types.py", line 30, in <dictcomp>
    CONNS = {k: db.connect.connect(v, N_THREADS) for k, v in CONN_STRINGS.items()}
  File "C:\code\datafold\xdiff\data_diff\databases\connect.py", line 204, in connect
    return connect_with_dict(db_conf, thread_count)
  File "C:\code\datafold\xdiff\data_diff\databases\connect.py", line 173, in connect_with_dict
    return cls(**d)
  File "C:\code\datafold\xdiff\data_diff\databases\snowflake.py", line 44, in __init__
    p_key = serialization.load_pem_private_key(
  File "C:\Python310\lib\site-packages\cryptography\hazmat\primitives\serialization\base.py", line 22, in load_pem_private_key
    return ossl.load_pem_private_key(data, password)
  File "C:\Python310\lib\site-packages\cryptography\hazmat\backends\openssl\backend.py", line 823, in load_pem_private_key
    return self._load_key(
  File "C:\Python310\lib\site-packages\cryptography\hazmat\backends\openssl\backend.py", line 1042, in _load_key
    utils._check_byteslike("password", password)
  File "C:\Python310\lib\site-packages\cryptography\utils.py", line 40, in _check_byteslike
    raise TypeError("{} must be bytes-like".format(name))
TypeError: password must be bytes-like

I think it should be able to accept string passwords, no?

erezsh avatar Aug 11 '22 17:08 erezsh

@matthiasekundayo-eb For Presto, I'm getting this error:

  File "C:\Python310\lib\site-packages\parameterized\parameterized.py", line 533, in standalone_func
    return func(*(a + p.args), **p.kwargs)
  File "C:\code\datafold\xdiff\tests\test_database_types.py", line 569, in test_types
    _drop_table_if_exists(dst_conn, dst_table)
  File "C:\code\datafold\xdiff\tests\common.py", line 98, in _drop_table_if_exists
    conn.query(f"DROP TABLE IF EXISTS {table}", None)
  File "C:\code\datafold\xdiff\data_diff\databases\base.py", line 101, in query
    res = self._query(sql_code)
  File "C:\code\datafold\xdiff\data_diff\databases\presto.py", line 65, in _query
    c = self._conn.cursor()
  File "C:\Users\erez\AppData\Roaming\Python\Python310\site-packages\prestodb\dbapi.py", line 166, in cursor
    request = self._create_request()
  File "C:\Users\erez\AppData\Roaming\Python\Python310\site-packages\prestodb\dbapi.py", line 141, in _create_request
    return prestodb.client.PrestoRequest(
  File "C:\Users\erez\AppData\Roaming\Python\Python310\site-packages\prestodb\client.py", line 256, in __init__
    raise ValueError("cannot use authentication with HTTP")
ValueError: cannot use authentication with HTTP

erezsh avatar Aug 11 '22 17:08 erezsh

Hi @erezsh, For Snowflake, you can't use password and key at the same time to connect. You can either use basic auth (username & password) or use username & ssh.

For Presto, did you use http_scheme = "https"?

matthiasekundayo-eb avatar Aug 16 '22 14:08 matthiasekundayo-eb

Hi @matthiasekundayo-eb , when I add http_scheme=https I get the following error in Python:

requests.exceptions.SSLError: HTTPSConnectionPool(host='127.0.0.1', port=8080): Max retries exceeded with url: /v1/statement (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:997)')))

On the server side, I'm getting this error repeatedly:

presto_1    | 2022-08-23T09:51:55.588Z  WARN    http-worker-137 org.eclipse.jetty.util.SharedBlockingCallback   Blocker not complete Blocker@396571e4{null}
presto_1    | 2022-08-23T09:51:55.588Z  WARN    http-worker-137 org.eclipse.jetty.util.thread.strategy.EatWhatYouKill
presto_1    | java.lang.NullPointerException
presto_1    |   at io.airlift.http.server.HttpServerChannelListener.onRequestEnd(HttpServerChannelListener.java:62)
presto_1    |   at io.airlift.http.server.HttpServerChannelListener.onResponseBegin(HttpServerChannelListener.java:70)
presto_1    |   at org.eclipse.jetty.server.HttpChannelListeners.onResponseBegin(HttpChannelListeners.java:187)
presto_1    |   at org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:814)
presto_1    |   at org.eclipse.jetty.server.HttpChannel.sendResponse(HttpChannel.java:842)
presto_1    |   at org.eclipse.jetty.server.HttpChannel.onBadMessage(HttpChannel.java:775)
presto_1    |   at org.eclipse.jetty.server.HttpChannelOverHttp.badMessage(HttpChannelOverHttp.java:283)
presto_1    |   at org.eclipse.jetty.http.HttpParser.badMessage(HttpParser.java:1628)
presto_1    |   at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:1610)
presto_1    |   at org.eclipse.jetty.server.HttpConnection.parseRequestBuffer(HttpConnection.java:364)
presto_1    |   at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:261)
presto_1    |   at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
presto_1    |   at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:103)
presto_1    |   at org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:117)
presto_1    |   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
presto_1    |   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
presto_1    |   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
presto_1    |   at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:129)
presto_1    |   at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:375)
presto_1    |   at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:806)
presto_1    |   at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:938)
presto_1    |   at java.base/java.lang.Thread.run(Thread.java:829)

Any idea what's causing it? It only happens for https, it seems.

As for snowflake, no problem, I can fix the error myself.

erezsh avatar Aug 23 '22 09:08 erezsh

Hi @erezsh Try using port = "8443" in the config.toml file instead of 8080. Could you please also share your config file with the password redacted?

matthiasekundayo-eb avatar Aug 23 '22 17:08 matthiasekundayo-eb

Hi @matthiasekundayo-eb , the port helped, thanks! Now I'm getting prestodb.exceptions.HttpError: error 401: b'Access Denied: Invalid credentials'

I don't really know why, but tomorrow I'll try to redo the process as you described it, from scratch, just in case I made a mistake on the way.

erezsh avatar Aug 23 '22 19:08 erezsh

@matthiasekundayo-eb I got it working. I think maybe I missed putting 'localhost' as Common Name. Either way, I think it's ready to merge. I will try to do so later today.

erezsh avatar Aug 24 '22 10:08 erezsh

Merged. Thank you for your contribution, and sorry it took a while!

erezsh avatar Aug 24 '22 11:08 erezsh

Thanks @erezsh

matthiasekundayo-eb avatar Aug 24 '22 13:08 matthiasekundayo-eb