Etl.Net icon indicating copy to clipboard operation
Etl.Net copied to clipboard

How to read XML file

Open paillave opened this issue 3 years ago • 3 comments

Discussed in https://github.com/paillave/Etl.Net/discussions/388

Originally posted by LordBigDuck November 10, 2022 Hello, I have gone through the documentation and source code but didn't manage to write running code to read XML file. Could you provide some samples ?

paillave avatar Nov 10 '22 23:11 paillave

Xml reading system still has to be improved, but for now, here is how you read XML file: FYI, whatever the size XML files (even gigabytes), the memory that is required to read it will never change; of course if you use operators that need to load the full dataset in memory (like a sort) you will have issues. here is the setup from the command line:

dotnet new console -o TestXml
cd TestXml
dotnet add package Paillave.EtlNet.Core
dotnet add package Paillave.EtlNet.XmlFile

here is the content of Program.cs:

// See https://aka.ms/new-console-template for more information
using System.Text;
using Paillave.Etl.Core;
using Paillave.Etl.XmlFile;
using Paillave.Etl.XmlFile.Core;

var testXmlContent = @"<root>
    <elt1 v1=""qwe""><v2>asd</v2></elt1>
    <elt2 v3=""yxc""><v4>rtz</v4></elt2>
    <elt1 v1=""mnb""><v2>poi</v2></elt1>
</root>";

var res = await StreamProcessRunner.CreateAndExecuteAsync("dummy", DefineProcess);
Console.WriteLine(res.Failed ? $"fail: {res.ErrorTraceEvent}" : "Success");

void DefineProcess(ISingleStream<string> contextStream)
{
    var xmlNodes = contextStream
        .Select("create in memory file with content for test", _ => FileValue.Create(new MemoryStream(Encoding.UTF8.GetBytes(testXmlContent)), "example.xml", "testContent"))
        .CrossApplyXmlFile("parse xml", new MyXmlFileDefinition());
    xmlNodes.XmlNodeOfType<Elt1Node>("only Etl1").Do("write elt1", i => Console.WriteLine($"Node type 1 : {i.V1} - {i.V2}"));
    xmlNodes.XmlNodeOfType<Elt2Node>("only Etl2").Do("write elt2", i => Console.WriteLine($"Node type 2 : {i.V3} - {i.V4}"));
}

class MyXmlFileDefinition : XmlFileDefinition
{
    public MyXmlFileDefinition()
    {
        this.AddNodeDefinition(XmlNodeDefinition.Create("elt1", "/root/elt1", i => new Elt1Node
        {
            V1 = i.ToXPathQuery<string>("/root/elt1/@v1"),
            V2 = i.ToXPathQuery<string>("/root/elt1/v2"),
        }));
        this.AddNodeDefinition(XmlNodeDefinition.Create("elt2", "/root/elt2", i => new Elt1Node
        {
            V1 = i.ToXPathQuery<string>("/root/elt2/@v3"),
            V2 = i.ToXPathQuery<string>("/root/elt2/v4"),
        }));
    }
}
class Elt1Node
{
    public string V1 { get; set; }
    public string V2 { get; set; }
}
class Elt2Node
{
    public string V3 { get; set; }
    public string V4 { get; set; }
}

paillave avatar Nov 11 '22 10:11 paillave

I am also having some issues with the XML Reader. I am trying to make it work without the need to create any specific class. Is this possible? What I have tried:

`

class Program
    {
        static void Main(string[] args)
        {
            string fileName = @"C:\path_to_my_file\file.xml";
       
            var xmlFileDefinition = new XmlFileDefinition();

            xmlFileDefinition.AddNodeDefinition(
                XmlNodeDefinition.Create("V2", "/ns:root", i => i.ToXPathQuery<string>("/ns:root/ns:elt1/ns:v2") )
                );

            xmlFileDefinition.AddNodeDefinition(
                XmlNodeDefinition.Create("V1", "/ns:root", i => i.ToXPathQuery<string>("/ns:root/ns:elt1/@v1"))
                );

            xmlFileDefinition.AddNameSpace("ns", "some_namespace");

            XmlObjectReader reader = new XmlObjectReader(xmlFileDefinition);

            Stream stream = null;
            reader.Read(stream, fileName, new Action<XmlNodeParsed>(), new CancellationToken());

    }
}

`

felipepodolan avatar Feb 24 '23 11:02 felipepodolan

Hi Felipe. At the moment, you MUST return a concrete class as what you provide is not perceived as a factory but as a mapper (it is an expression, not a delegate). Moreover, this is the type of the returned element that will permit you to recognize the issued elements thanks to the operator XmlNodeOfType. This is a sujet that I may work on a bit deeper as I will work on an implementation of a fast JSON parser as well, and I believe I will share algorithms. The way to setup the extract from this kind of tree structured files (xml, json, yaml...) may need to be changed compared to what I did so far.

paillave avatar Feb 24 '23 13:02 paillave