starrocks icon indicating copy to clipboard operation
starrocks copied to clipboard

Support converting JSON to STRUCT

Open murphyatwork opened this issue 1 year ago • 5 comments

  • [x] https://github.com/StarRocks/starrocks/pull/47233
  • [x] https://github.com/StarRocks/starrocks/pull/46448
  • [x] https://github.com/StarRocks/starrocks/pull/45406

Feature request

Is your feature request related to a problem? Please describe.

STRUCT type is more performant that JSON in terms of query, but which use the fixed schema and annoying to construct.

If it's possible to convert JSON to STRUCT directly, like to_struct(JSON), it can be much easier to use.

Describe the solution you'd like

  1. Use a function to convert JSON to STRUCT:
    • Example: select to_struct(parse_json("{"a": 1}"))
    • Difficulty: cannot inference the schema of struct before parsing the JSON
  2. Specify the STRUCT schema and convert:
    • Example: select to_struct<a int, b int>(parse_json("a": 1, "b": 2))
    • Difficulty: not easy to use

Describe alternatives you've considered

Additional context

murphyatwork avatar Mar 25 '24 17:03 murphyatwork

Maybe, we could implement the second one first, and try to do the first one in the future (if needed), which will use the implementation for the second one.

jaogoy avatar Mar 26 '24 01:03 jaogoy

+1 on this issue. We are currently evaluating StarRocks as a replacement for Trino. However, we very much rely on being able to cast JSON values to Structs at query time which a feature supported by Trino but not by StarRocks.

Using named_struct and parse_json as well as json extract functions works for simple objects decently enough. However, json objects that contain arrays of objects become very challenging to extract if you want to preserve the relations of keys in the inner objects and not flatten the data.

kyle-goodale-klaviyo avatar Sep 26 '24 17:09 kyle-goodale-klaviyo

Thanks, @kyle-goodale-klaviyo . This issue focuses on stream load, and we have implemented the conversion of JSON to structs during data ingestion. Could you share some examples (SQL snippets) demonstrating how to use Trino to convert JSON values into structs?
CC @wangsimo0

Dshadowzh avatar Sep 27 '24 17:09 Dshadowzh

@Dshadowzh Absolutely, here is a minimal example that includes nested objects, as well as arrays of objects:

SELECT 
   CAST(json_parse(
   '{"a": 5, "b_obj": {"bb": true}, "b_arr_obj": [{"foo": "bar"}, {"foo": "baz"}]}'
   ) AS ROW(
        a int,
        b_obj ROW(
            bb boolean
        ),
        b_arr_obj ARRAY(
            ROW(
                foo varchar
            )
        )
    )
   )

Trino calls structs rows, but they are the same thing. One of the important parts is that when casting an array of objects it preserves them as arrays of structs. Currently, I can somewhat get around this issue in StarRocks by using named struct and extracting the json values manually.

StarRocks example:

named_struct(
    'first_name', get_json_string(`data`, '$.first_name'),
    'last_name', get_json_string(`data`, '$.last_name')
)

This works great for objects and even nested objects. However, this breaks down when you encounter an array of objects if you wish to preserve the inner objects as structs within an array. This method of extraction requires that you flatten everything as there is no way to traverse the json lists and extract values out while maintaining there relations to one another.

Please do let me know if I can provide any more examples or clarify the issue further!

kyle-goodale-klaviyo avatar Sep 27 '24 17:09 kyle-goodale-klaviyo

@Dshadowzh Absolutely, here is a minimal example that includes nested objects, as well as arrays of objects:

SELECT 
   CAST(json_parse(
   '{"a": 5, "b_obj": {"bb": true}, "b_arr_obj": [{"foo": "bar"}, {"foo": "baz"}]}'
   ) AS ROW(
        a int,
        b_obj ROW(
            bb boolean
        ),
        b_arr_obj ARRAY(
            ROW(
                foo varchar
            )
        )
    )
   )

Trino calls structs rows, but they are the same thing. One of the important parts is that when casting an array of objects it preserves them as arrays of structs. Currently, I can somewhat get around this issue in StarRocks by using named struct and extracting the json values manually.

StarRocks example:

named_struct(
    'first_name', get_json_string(`data`, '$.first_name'),
    'last_name', get_json_string(`data`, '$.last_name')
)

This works great for objects and even nested objects. However, this breaks down when you encounter an array of objects if you wish to preserve the inner objects as structs within an array. This method of extraction requires that you flatten everything as there is no way to traverse the json lists and extract values out while maintaining there relations to one another.

Please do let me know if I can provide any more examples or clarify the issue further!

it's better to implement a native function converting JSON ARRAY to ARRAY<STRUCT> to accomplish this, otherwise it would be pretty verbose

murphyatwork avatar Oct 17 '24 03:10 murphyatwork

We have marked this issue as stale because it has been inactive for 6 months. If this issue is still relevant, removing the stale label or adding a comment will keep it active. Otherwise, we'll close it in 10 days to keep the issue queue tidy. Thank you for your contribution to StarRocks!

github-actions[bot] avatar Apr 28 '25 11:04 github-actions[bot]

Hello, we are still looking for this functionality. I have been periodically checking release notes as we would like to use StarRocks but are stuck with Trino as they support casting json to struct types.

kyle-goodale-klaviyo avatar May 01 '25 17:05 kyle-goodale-klaviyo

Also need this function, come to remind the dev team to notice this request again lol

MichaelHan0228 avatar May 22 '25 12:05 MichaelHan0228

I'm not sure when this was added, but I tested in the latest version and I was able to cast a json value to a struct.

Would recommend adding information on this to the docs because this is huge that it is now supported!

kyle-goodale-klaviyo avatar Jul 08 '25 17:07 kyle-goodale-klaviyo