chdb-server-bak
chdb-server-bak copied to clipboard
Weird behaviour on requests with newlines
$ curl -vv -X POST http://a:b@localhost:8123 --data 'CREATE DATABASE test';
$ cat <<EOF | curl -vv -X POST http://a:b@localhost:8123 --data-binary @-
CREATE TABLE IF NOT EXISTS test.settings (fingerprint UInt64, type String, name String, value String, inserted_at DateTime64(9, 'UTC')) ENGINE = ReplacingMergeTree(inserted_at) ORDER BY fingerprint
EOF
$ cat <<EOF | curl -vv -X POST http://a:b@localhost:8123/?database=test --data-binary @-
INSERT INTO settings (fingerprint, type, name, value, inserted_at) VALUES (cityHash64('update_v3_5'), 'update',
'v3_1', toString(toUnixTimestamp(NOW())), NOW())
EOF
< HTTP/1.1 400 BAD REQUEST
< Server: Werkzeug/3.0.1 Python/3.8.10
< Date: Fri, 03 Nov 2023 15:40:44 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 226
< Connection: close
<
Code: 62. DB::Exception: Code: 62. DB::Exception: Cannot parse expression of type String here: : While executing ValuesBlockInputFormat: data for INSERT was parsed from query. (SYNTAX_ERROR) (version 23.6.1.1). (SYNTAX_ERROR)
* Closing connection 0
The third request gets error 400 with some misleading message. On the other hand:
$ cat <<EOF | curl -vv -X POST http://a:b@localhost:8123/?database=test --data-binary @-
INSERT INTO settings (fingerprint, type, name, value, inserted_at) VALUES (cityHash64('update_v3_5'), 'update', 'v3_1', toString(toUnixTimestamp(NOW())), NOW())
EOF
Note: Unnecessary use of -X or --request, POST is already inferred.
* Trying 127.0.0.1:8123...
* Connected to localhost (127.0.0.1) port 8123 (#0)
* Server auth using Basic with user 'a'
> POST /?database=test HTTP/1.1
> Host: localhost:8123
> Authorization: Basic YTpi
> User-Agent: curl/7.88.1
> Accept: */*
> Content-Length: 162
> Content-Type: application/x-www-form-urlencoded
>
< HTTP/1.1 200 OK
< Server: Werkzeug/3.0.1 Python/3.8.10
< Date: Fri, 03 Nov 2023 15:45:31 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 0
< Connection: close
<
* Closing connection 0
When the same request is sent with no newline inside, the data gets ingested successfully. Newlines in the middle of a request should be omitted by server.
The current support is barely plaintext only. To properly support POST data (including binary formats) we should expose a data hook in chdb equivalent to stdin (or --data) in clickhouse-local
# python3 -m chdb "SELECT 1", "Native" > output.native
# python3 -m chdb "SELECT * FROM table" < output.native
1
@auxten is there a way we can expose/hook to pass data into chdb without using rough stdin?
The data pipe won't be available until future versions of chdb will support it without stdin hacks. I've implemented a workaround in 0.15.3 which should work around the newline issues until better things come along.
The workaround was weak and didn't cover binary protocols. so as predicted this needs an stdin hack to function. Here's a prototype to test: https://github.com/chdb-io/chdb-server/pull/11