ChoETL icon indicating copy to clipboard operation
ChoETL copied to clipboard

Flatten Complex Json structure

Open CreaterCalvin opened this issue 2 years ago • 12 comments

I have these line of code:

using (var r = new ChoJSONReader("data.json")
                         .Configure(c => c.ThrowAndStopOnMissingField = true)
                         .Configure(c => c.DefaultArrayHandling = true)
                         .Configure(c => c.FlattenNode = true)
                         .Configure(c => c.IgnoreArrayIndex = false)
                         .Configure(c => c.NestedKeySeparator = '.')
                         .Configure(c => c.NestedColumnSeparator = '.'))
            {
                var dt = r.AsDataTable();
                Console.WriteLine(dt.DumpAsJson());
            }

and my json:

{
  "BrandId": "998877665544332211",
  "Categories": [
    "112233445566778899"
  ],
  "Contact": {
    "Phone": [
      {
        "Value": "12346789",
        "Description": {
          "vi": "Phone"
        },
        "Type": 1
      },
      {
        "Value": "987654321",
        "Description": {
          "vi": "Phone"
        },
        "Type": 1
      }
    ]
  }
}

My expected output:

[
  {
    "BrandId": "998877665544332211",
    "Contact.Phone.Value": "12346789",
    "Contact.Phone.Description.vi": "Phone",
    "Contact.Phone.Type": 1,
    "Category": "112233445566778899"
  },
  {
    "BrandId": "998877665544332211",
    "Contact.Phone.Value": "987654321",
    "Contact.Phone.Description.vi": "Phone",
    "Contact.Phone.Type": 1,
    "Category": "112233445566778899"
  }
]

After running code, I got the output like this:

[
  {
    "BrandId": "998877665544332211",
    "Contact.Phone.0.Value": "12346789",
    "Contact.Phone.0.Description.vi": "Phone",
    "Contact.Phone.0.Type": 1,
    "Contact.Phone.1.Value": "987654321",
    "Contact.Phone.1.Description.vi": "Phone",
    "Contact.Phone.1.Type": 1,
    "Category0": "112233445566778899"
  }
]

Thanks for your interest in the question Looking forward to hearing from you @Cinchoo

CreaterCalvin avatar Mar 22 '22 08:03 CreaterCalvin

As your json is nested in nature, you need to unpack and flatten into multiple rows using linq.

Sample fiddle: https://dotnetfiddle.net/PHK8LO

Cinchoo avatar Mar 22 '22 16:03 Cinchoo

@Cinchoo Is there any more generic way instead of using Linq. I would prefer it could transform via .Configure() with .WithField(), is it possible? Because I would like to make it as simple as possible, in order to ease for my more than 200 json schemas. Because of that reason, linq is seem to be not suitable for my problem image

CreaterCalvin avatar Mar 23 '22 03:03 CreaterCalvin

Add attribute to handle your req. Use FlattenByNodeName to control the flatten node. Hope it helps.

Lib version: ChoETL.JSON.Core v1.2.1.45-beta1

Sample fiddle: https://dotnetfiddle.net/ZWRd1R

Cinchoo avatar Mar 23 '22 22:03 Cinchoo

@Cinchoo Thanks a lot for your help. This lib could handle my prob smoothly.

CreaterCalvin avatar Mar 24 '22 02:03 CreaterCalvin

@Cinchoo Looking forward to hearing from you if your lib has new update & Looking forward to your help and support if I have new issue, too. Peace!!!

CreaterCalvin avatar Mar 24 '22 02:03 CreaterCalvin

@Cinchoo I have just discovered a new issue from your library When I'm testing some special case base on you package: https://dotnetfiddle.net/AdFCjU In this code, I'm flatten this json string:

@"{
  ""BrandId"": ""998877665544332211"",
  ""Categories"": [ ""112233445566778899"" ],
  ""Contact"": {
    ""Phone"": [
      {
        ""Value"": ""12346789"",
        ""Description"": { ""vi"": ""Phone"" },
        ""Type"": 1
      },
	  {
        ""Value"": ""987654321"",
        ""Description"": { ""vi"": ""Phone"" },
        ""Type"": 1
      }
    ],
	 ""Phone2"": [
      {
        ""Value"": ""999888777"",
        ""Description"": { ""vi"": ""Phone"" },
        ""Type"": 1
      }
    ]
  }
}";

It's return output:


[
  {
    "BrandId": "998877665544332211",
    "Categories": "112233445566778899",
    "Contact.Phone2.Value": "999888777",
    "Contact.Phone2.Description.vi": "Phone",
    "Contact.Phone2.Type": 1,
    "Contact.Phone.Value": "12346789",
    "Contact.Phone.Description.vi": "Phone",
    "Contact.Phone.Type": 1
  },
  {
    "BrandId": "998877665544332211",
    "Categories": "112233445566778899",
    "Contact.Phone2.Value": "999888777",
    "Contact.Phone2.Description.vi": "Phone",
    "Contact.Phone2.Type": 1,
    "Contact.Phone.Value": "987654321",
    "Contact.Phone.Description.vi": "Phone",
    "Contact.Phone.Type": 1
  }
]

Result above is exactly what I wanted

But if I change name of Phone2 into BlaBlaBlaPhone

@"{
  ""BrandId"": ""998877665544332211"",
  ""Categories"": [ ""112233445566778899"" ],
  ""Contact"": {
    ""Phone"": [
      {
        ""Value"": ""12346789"",
        ""Description"": { ""vi"": ""Phone"" },
        ""Type"": 1
      },
	  {
        ""Value"": ""987654321"",
        ""Description"": { ""vi"": ""Phone"" },
        ""Type"": 1
      }
    ],
	 ""BlablablaPhone"": [ 
      {
        ""Value"": ""999888777"",
        ""Description"": { ""vi"": ""Phone"" },
        ""Type"": 1
      }
    ]
  }
}"

It's return output without value in BlablablaPhone node:

[
  {
    "BrandId": "998877665544332211",
    "Categories": "112233445566778899",
    "Contact.Phone.Value": "12346789",
    "Contact.Phone.Description.vi": "Phone",
    "Contact.Phone.Type": 1
  },
  {
    "BrandId": "998877665544332211",
    "Categories": "112233445566778899",
    "Contact.Phone.Value": "987654321",
    "Contact.Phone.Description.vi": "Phone",
    "Contact.Phone.Type": 1
  },
  {
    "BrandId": "998877665544332211",
    "Categories": "112233445566778899",
    "Contact.Phone.Value": null,
    "Contact.Phone.Description.vi": null,
    "Contact.Phone.Type": null
  }
]

@Cinchoo

CreaterCalvin avatar Mar 30 '22 12:03 CreaterCalvin

I’m on vacation, will not be get back to you on time. Will update once back from it.

Cinchoo avatar Mar 31 '22 18:03 Cinchoo

It is a bug, putting a fix. Will let you know.

Cinchoo avatar Apr 05 '22 20:04 Cinchoo

Applied fix, released ChoETL.JSON.Core v1.2.1.45-beta2.

Cinchoo avatar Apr 06 '22 17:04 Cinchoo

@Cinchoo Hi, I have just got the issue with NestedKeySeparator from 1.2.1.45.Beta-2 onward, Configure(c => c.NestedKeySeparator = '.') is works fine but Configure(c => c.NestedKeySeparator = '~') cannot be flattened as expected.

This bug still exists in 1.2.1.47

https://dotnetfiddle.net/327X8k

zhenyuan0502 avatar Apr 22 '22 11:04 zhenyuan0502

found the issue, applied fix. released v1.2.1.48 (beta1)

Sample fiddle: https://dotnetfiddle.net/SDY3ql

Cinchoo avatar Apr 22 '22 14:04 Cinchoo

@Cinchoo Hi, today I see the issue about the flatten feature, which they will be lost root information if the flatten node whom array

@"{
  ""BrandId"": ""998877665544332211"",
  ""Categories"": [ ""112233445566778899"" ],
  ""Contact"": [{
        ""Value"": ""12346789"",
		""PostCode"":
		[
		{
		""Area"":""SA"",
		""Value"": ""+84""
		},
		{
		""Area"":""SA"",
		""Value"": ""+841111""
		}
		],
        ""Description"": { ""vi"": ""Phone"" },
        ""Type"": 1
      },
      {
        ""Value"": ""987654321"",
		""PostCode"":
		[
		{
		""Area"":""SA"",
		""Value"": ""+8412""
		}
		],
        ""Description"": { ""vi"": ""Phone"" },
        ""Type"": 1
    }]
}";

I try both FlattenByNodeName and FlattenByJsonPath, they got different results but no one could fit the expected result like below json:

[
  {
    "BrandId": "998877665544332211",
    "Value": "12346789",
    "Description.vi": "Phone",
    "Type": 1,
    "PostCode~Area": "SA",
    "PostCode~Value": "+84"
  },
  {
    "BrandId": "998877665544332211",
    "Value": "12346789",
    "Description.vi": "Phone",
    "Type": 1,
    "PostCode~Area": "SA",
    "PostCode~Value": "+841111"
  },
  {
    "BrandId": "998877665544332211",
    "Value": "987654321",
    "Description.vi": "Phone",
    "Type": 1,
    "PostCode~Area": "SA",
    "PostCode~Value": "+8412"
  }
]

If I remove the PostCode, it worked fine normally. https://dotnetfiddle.net/ZqyiI2

I am very interesting in this feature and I could help to improve it because I am building a new and dynamic ETL project based on your library, could you push your latest code then I can take a look.

zhenyuan0502 avatar Apr 25 '22 11:04 zhenyuan0502