Possibly add a function that's similar to pandas json_normalized

Open MrPowers opened this issue 2 years ago • 3 comments

Apr 13 '23 21:04 MrPowers

Kind of cheating but a naive solution is to use pandas json_normalized to parse the json and then convert the resulting pandas df into Spark. The logic seems a bit too simple to justify a dedicated helper function though

Feb 01 '24 14:02 huynguyent

@huynguyent - would be nice to create an implementation that's really performant and doesn't depend on pandas!

Feb 01 '24 14:02 MrPowers

It is possible only if you know the final schema. Otherwise you need to infer the schema first somehow. And even with known schema the simplest solution is still to use UDFs. My first question, do we know the schema in such a case? If not, I would suggest to start from the function like infer_json_schema(col).

Feb 01 '24 18:02 SemyonSinchenko