ch icon indicating copy to clipboard operation
ch copied to clipboard

Support new JSON

Open ruslandoga opened this issue 10 months ago • 7 comments

Closes #218


Somewhat depends on the answer to https://github.com/ClickHouse/ClickHouse/issues/75768.

Rust client went the easy way: https://github.com/ClickHouse/clickhouse-rs/pull/171 -- guess we can do that too (https://github.com/plausible/ch/pull/246).

ruslandoga avatar Feb 08 '25 18:02 ruslandoga

@ruslandoga any development here? I've been using this branch for awhile and works well, just need a few hacks for querying json with ecto-ch, but that's another story 😄

kurayama avatar Mar 21 '25 00:03 kurayama

👋 @kurayama

Thank you for reminding me about this one!

I couldn't come up with a good solution right away and kind of put it off for later until Plausible or someone else asks for JSON support :)

I guess that time has come! I'll try to finish it this weekend. Initially with input_format_binary_read_json_as_string and output_format_binary_write_json_as_string and maybe later with the "true" binary JSON. My plan is to make the RowBinary encoder/decoder be aware of the settings, and if input_format_binary_read_json_as_string: 1 is present, we encode with Jason.encode_to_iodata! and similarly for output_format_binary_write_json_as_string: 1 and Jason.decode!. Otherwise we raise with a helpful error message about the current :json type limitations. This would make it possible for us to support binary JSON (instead of raising) later without any breaking changes (if no "json as string" settings are provided).

need a few hacks for querying json with ecto-ch

Feel free to share the problems you have in this thread, I would be happy to change ecto-ch to make working with JSON easier :)

ruslandoga avatar Mar 21 '25 08:03 ruslandoga

@ruslandoga thanks for the reply

My issue might be just lack of familiarity with the library. I want to dynamically receive a string from a query parser on user input and use it as a json subpath.

Right now I have something like:

  defp filter(query, path, value) do
    query
    |> where([e], fragment("?.?", e.my_json_field, literal(^path)) == ^value)
  end

I don't know if there's a better approach without using a literal, this works but seems a bit hackish 😅 My paths are e.g foo.bar.biz to filter on a nested property of the json field.

kurayama avatar Mar 21 '25 20:03 kurayama

Have you tried json_extract_path? ecto_ch implements this Ecto operation with JSON_QUERY

I wonder if that would work:

defp filter(query, path, value) do
  query
  |> where([e], json_extract_path(e.my_json_field, ^path) == ^value) # or ^[path] or [^path]
end

I'll try adding some tests around this use-case into ecto_ch to see how well it works tomorrow.

ruslandoga avatar Mar 21 '25 21:03 ruslandoga

I'm getting:

iex(6)> ClickhouseRepo.all(Demo.Entry |> where([e], json_extract_path(e.my_json_field, ^["foo", "bar"]) == "5"))
** (UndefinedFunctionError) function Ch.type/0 is undefined or private. Did you mean:

      * type/1

    (ch 0.3.1) Ch.type()
    (elixir 1.18.3) lib/enum.ex:1840: Enum."-map_reduce/3-lists^mapfoldl/2-0-"/3
    (elixir 1.18.3) lib/enum.ex:2546: Enum."-reduce/3-lists^foldl/2-0-"/3
    (elixir 1.18.3) lib/enum.ex:2546: Enum."-reduce/3-lists^foldl/2-0-"/3
    (ecto 3.12.5) lib/ecto/repo/queryable.ex:214: Ecto.Repo.Queryable.execute/4
    (ecto 3.12.5) lib/ecto/repo/queryable.ex:19: Ecto.Repo.Queryable.all/3

In any case, JSON_QUERY doesn't seem compatible with the new json type.

27b50acad3a7 :) SELECT * FROM entries WHERE JSON_QUERY(my_json_field,'$.foo.bar') = 'biz' LIMIT 1

SELECT *
FROM entries
WHERE JSON_QUERY(my_json_field, '$.foo.bar') = 'biz'
LIMIT 1

Query id: 36de9fb0-8f0d-4530-8de5-27d82e90b4ec


Elapsed: 0.003 sec.

Received exception from server (version 25.3.1):
Code: 43. DB::Exception: Received from localhost:9000. DB::Exception: JSONPath functions require first argument to be JSON of string, illegal type: JSON. (ILLEGAL_TYPE_OF_ARGUMENT)

For the query to work it needs to be:

SELECT * FROM entries WHERE my_json_field.foo.bar = 'biz'

kurayama avatar Mar 22 '25 03:03 kurayama

My plan is to make the RowBinary encoder/decoder be aware of the settings, and if input_format_binary_read_json_as_string: 1 is present, we encode with Jason.encode_to_iodata! and similarly for output_format_binary_write_json_as_string: 1 and Jason.decode!.

This approach didn't work out quite as I had expected since we might not have access to those setting options if they were passed into Ch.start_link instead of each individual Ch.query ... so one would need to pass them to each Ch.query / Repo operation:

Ch.query!(
  conn,
  "SELECT json FROM test",
  _no_params = [],
  settings: [
    output_format_binary_write_json_as_string: 1
  ]
)

Repo.all(Schema, settings: [output_format_binary_write_json_as_string: 1])

ruslandoga avatar Mar 23 '25 21:03 ruslandoga

It would be great to to have support for the native JSON type (https://clickhouse.com/docs/sql-reference/data-types/newjson), this would simply some of the usages

syepes avatar Apr 10 '25 15:04 syepes

Hey! Any updates on this PR?

arrowcircle avatar Jul 15 '25 12:07 arrowcircle

👋

Sorry for the delay! I think this PR wouldn't work as is, and since I don't want to push the decision of binary vs text JSON to the users, I think the "right" way is to implement binary JSON encoding/decoding. I plan on attempting it today after taking care of some other easier issues :)

WIP: https://github.com/plausible/ch/pull/261

ruslandoga avatar Jul 15 '25 13:07 ruslandoga

👋

I created a branch that simplifies this PR by hard-coding the assumption of "json as text" while #261 is being worked on: https://github.com/plausible/ecto_ch/pull/233 -- it uses https://github.com/plausible/ch/pull/262 "underneath"

ruslandoga avatar Jul 16 '25 12:07 ruslandoga

👋

https://hex.pm/packages/ch/0.5.0 and https://hex.pm/packages/ecto_ch/0.8.0 now have basic JSON (as text) support. The input_format_binary_read_json_as_string and output_format_binary_write_json_as_string options are added automatically in ClickHouse 24.10+ versions. Dynamic type support is planned :)

Demos:

Mix.install [{:ch, "~> 0.5.0"}]

{:ok, pid} = Ch.start_link()

[[%{}]] = Ch.query!(pid, ~s|select '{"a":null}'::JSON|).rows
[[%{"a" => "b"}]] = Ch.query!(pid, ~s|select '{"a":"b"}'::JSON|).rows
# note 42 becomes a string since arrays need to have the same type
[[%{"a" => ["b", "42"]}]] = Ch.query!(pid, ~s|select '{"a":["b", 42]}'::JSON|).rows

Ch.query!(pid, "create table ch_json(data JSON) engine Memory")
Ch.query!(pid, "insert into ch_json(data) format RowBinary", [[%{}], [nil], [%{"a" => 42}]], types: ["JSON"]).num_rows
[[%{}], [%{}], [%{"a" => "42"}]] = Ch.query!(pid, "select * from ch_json").rows
Mix.install [{:ecto_ch, "~> 0.8.0"}]

defmodule Repo do
  use Ecto.Repo, otp_app: :demo, adapter: Ecto.Adapters.ClickHouse
end

Repo.start_link()

Repo.query!("select version()").rows
#=> [["25.6.3.116"]]

# note 42 becomes a string here as well for some reason (that's how ClickHouse decides to encode it when output_format_binary_write_json_as_string)
[[%{"a" => "42"}]] = Repo.query!(~s|select '{"a":42}'::JSON|).rows

defmodule Event do
  use Ecto.Schema
  
  @primary_key false
  schema "ecto_ch_json" do
    field :data, Ch, type: "JSON"
  end
end

Repo.query!("create table ecto_ch_json(data JSON) engine Memory")

Repo.insert_all(Event, [%{data: %{"a" => 42}}, %{data: nil}, %{data: %{}}, %{data: %{a: [1, 2, 3]}}])

import Ecto.Query

Repo.all(from e in Event, select: fragment("?.a::text", e.data))
#=> ["42", "", "", "[1,2,3]"]

ruslandoga avatar Jul 17 '25 17:07 ruslandoga