ChoETL icon indicating copy to clipboard operation
ChoETL copied to clipboard

Column renaming and nested structures

Open mat-berna opened this issue 3 years ago • 1 comments

I want to create a parquet file with this schema:

public class CompleteFile
{
    public string DataSource { get; set; }
    public long? DataType { get; set; }
    public GeneralData GeneralData { get; set; }
    public Samples Samples { get; set; }
}

public class Samples
{
    public IEnumerable<Data> Samples1 { get; set; }
    public IEnumerable<Data> Samples2 { get; set; }
    public IEnumerable<Data> Samples3 { get; set; }
    public Data Sample4 { get; set; }
}

public class Data
{
    public long? Prop00{ get; set; }
    public string Prop01 { get; set; }
    public Properties Properties { get; set; }
    public MyImage Image { get; set; }
}

public class Properties
{
    public long? Prop10 { get; set; }
    public long? Prop11 { get; set; }
}

I'm using this code to write file:

using (var r= ChoJSONReader<CompleteFile>.LoadText(json).UseJsonSerialization())
{
    var bytes = ChoParquetWriter.SerializeAll(r);
    File.WriteAllBytes(path, bytes);
}

It's working but when I check the file content I see two problems:

  1. Is it possible to rename a column maybe using a decorator as you can do with [JsonProperty("MyProperty")] using Newtonsoft library?
  2. From the second level of nested classes I cannot see data. I see this issue #139 so can't I create my schema?

mat-berna avatar Dec 09 '21 14:12 mat-berna

With the latest release (ChoETL.Parquet.1.0.1.14 / ChoETL.NETStandard.1.2.1.32), you can produce parquet file as below

using (var r = ChoJSONReader<CompleteFile>.LoadText(json.ToString()).UseJsonSerialization())
{
    using (var w = new ChoParquetWriter("CompleteFile.parquet"))
    {
        w.Write(r.Select(rec1 => rec1.FlattenToDictionary()));
    }
}

Use DisplayNameAttribute to rename property.

Cinchoo avatar Dec 09 '21 21:12 Cinchoo