chdb-server-bak icon indicating copy to clipboard operation
chdb-server-bak copied to clipboard

Weird behaviour on requests with newlines

Open akvlad opened this issue 2 years ago • 3 comments

$ curl -vv -X POST http://a:b@localhost:8123 --data 'CREATE DATABASE test';
$ cat <<EOF | curl -vv -X POST http://a:b@localhost:8123 --data-binary @-
CREATE TABLE IF NOT EXISTS test.settings (fingerprint UInt64, type String, name String, value String, inserted_at DateTime64(9, 'UTC')) ENGINE = ReplacingMergeTree(inserted_at) ORDER BY fingerprint

EOF

$ cat <<EOF | curl -vv -X POST http://a:b@localhost:8123/?database=test --data-binary @-
INSERT INTO settings (fingerprint, type, name, value, inserted_at) VALUES (cityHash64('update_v3_5'), 'update',
     'v3_1', toString(toUnixTimestamp(NOW())), NOW())
EOF

< HTTP/1.1 400 BAD REQUEST
< Server: Werkzeug/3.0.1 Python/3.8.10
< Date: Fri, 03 Nov 2023 15:40:44 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 226
< Connection: close
< 
Code: 62. DB::Exception: Code: 62. DB::Exception: Cannot parse expression of type String here: : While executing ValuesBlockInputFormat: data for INSERT was parsed from query. (SYNTAX_ERROR) (version 23.6.1.1). (SYNTAX_ERROR)
* Closing connection 0

The third request gets error 400 with some misleading message. On the other hand:

$ cat <<EOF | curl -vv -X POST http://a:b@localhost:8123/?database=test --data-binary @-
INSERT INTO settings (fingerprint, type, name, value, inserted_at) VALUES (cityHash64('update_v3_5'), 'update', 'v3_1', toString(toUnixTimestamp(NOW())), NOW())

EOF
Note: Unnecessary use of -X or --request, POST is already inferred.
*   Trying 127.0.0.1:8123...
* Connected to localhost (127.0.0.1) port 8123 (#0)
* Server auth using Basic with user 'a'
> POST /?database=test HTTP/1.1
> Host: localhost:8123
> Authorization: Basic YTpi
> User-Agent: curl/7.88.1
> Accept: */*
> Content-Length: 162
> Content-Type: application/x-www-form-urlencoded
> 
< HTTP/1.1 200 OK
< Server: Werkzeug/3.0.1 Python/3.8.10
< Date: Fri, 03 Nov 2023 15:45:31 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 0
< Connection: close
< 
* Closing connection 0

When the same request is sent with no newline inside, the data gets ingested successfully. Newlines in the middle of a request should be omitted by server.

akvlad avatar Nov 03 '23 15:11 akvlad

The current support is barely plaintext only. To properly support POST data (including binary formats) we should expose a data hook in chdb equivalent to stdin (or --data) in clickhouse-local

# python3 -m chdb "SELECT 1", "Native" > output.native
# python3 -m chdb "SELECT * FROM table" < output.native
1 

@auxten is there a way we can expose/hook to pass data into chdb without using rough stdin?

lmangani avatar Nov 03 '23 16:11 lmangani

The data pipe won't be available until future versions of chdb will support it without stdin hacks. I've implemented a workaround in 0.15.3 which should work around the newline issues until better things come along.

lmangani avatar Nov 11 '23 18:11 lmangani

The workaround was weak and didn't cover binary protocols. so as predicted this needs an stdin hack to function. Here's a prototype to test: https://github.com/chdb-io/chdb-server/pull/11

lmangani avatar Nov 18 '23 19:11 lmangani