ChoETL
ChoETL copied to clipboard
Column renaming and nested structures
I want to create a parquet file with this schema:
public class CompleteFile
{
public string DataSource { get; set; }
public long? DataType { get; set; }
public GeneralData GeneralData { get; set; }
public Samples Samples { get; set; }
}
public class Samples
{
public IEnumerable<Data> Samples1 { get; set; }
public IEnumerable<Data> Samples2 { get; set; }
public IEnumerable<Data> Samples3 { get; set; }
public Data Sample4 { get; set; }
}
public class Data
{
public long? Prop00{ get; set; }
public string Prop01 { get; set; }
public Properties Properties { get; set; }
public MyImage Image { get; set; }
}
public class Properties
{
public long? Prop10 { get; set; }
public long? Prop11 { get; set; }
}
I'm using this code to write file:
using (var r= ChoJSONReader<CompleteFile>.LoadText(json).UseJsonSerialization())
{
var bytes = ChoParquetWriter.SerializeAll(r);
File.WriteAllBytes(path, bytes);
}
It's working but when I check the file content I see two problems:
- Is it possible to rename a column maybe using a decorator as you can do with
[JsonProperty("MyProperty")]
usingNewtonsoft
library? - From the second level of nested classes I cannot see data. I see this issue #139 so can't I create my schema?
With the latest release (ChoETL.Parquet.1.0.1.14 / ChoETL.NETStandard.1.2.1.32), you can produce parquet file as below
using (var r = ChoJSONReader<CompleteFile>.LoadText(json.ToString()).UseJsonSerialization())
{
using (var w = new ChoParquetWriter("CompleteFile.parquet"))
{
w.Write(r.Select(rec1 => rec1.FlattenToDictionary()));
}
}
Use DisplayNameAttribute
to rename property.