node-xmlsplit icon indicating copy to clipboard operation
node-xmlsplit copied to clipboard

Splitting on tags more than 1 level deep confuses xmlsplit

Open BertCatsburg opened this issue 3 years ago • 5 comments

XML File

I have the following small XML file

<?xml version="1.0" encoding="utf-8"?>
<Outer>
    <Inner attr="xxx">
        <A>1</A>
    </Inner>
    <Inner otherattr="yyy">
        <A>2-0</A>
        <A>2-1</A>
        <A>2-2</A>
        <A>2-3</A>
    </Inner>
    <Inner>
        <A>
            <B attr="AA"/>
            <C>
                <D Dattr="Value"/>
            </C>
        </A>
    </Inner>
</Outer>

Program

And the following file

import fs from 'fs';

const XmlSplit = require('xmlsplit');

const xmlsplit = new XmlSplit(1, 'A'); // Splitting on Tag <A>

const CHUNK_SIZE = 200; // bytes

const xmlfile = 'Test.xml';


async function start() {

    const stream = fs.createReadStream(xmlfile, { highWaterMark: CHUNK_SIZE});
    stream.pipe(xmlsplit).on('data', function(data: any) {
        const xmlDocument = data.toString();
        console.log(xmlDocument);
        console.log('--------------------------------------')
    });
}

start();

Expected output

You would expect different XML documents with A-tags, either

<Outer>
	<Inner>
		<A>
			...
		</A>
	<Inner>
</Outer

or an XML without the Inner tag.

Realized output

But XmlSplit return the following:

<?xml version="1.0" encoding="utf-8"?>
<Outer>
    <Inner attr="xxx">
        <A>1</A></Outer>
--------------------------------------
<?xml version="1.0" encoding="utf-8"?>
<Outer>
    <Inner attr="xxx">

    </Inner>
    <Inner otherattr="yyy">
        <A>2-0</A></Outer>
--------------------------------------
<?xml version="1.0" encoding="utf-8"?>
<Outer>
    <Inner attr="xxx">

        <A>2-1</A></Outer>
--------------------------------------
<?xml version="1.0" encoding="utf-8"?>
<Outer>
    <Inner attr="xxx">

        <A>2-2</A></Outer>
--------------------------------------
<?xml version="1.0" encoding="utf-8"?>
<Outer>
    <Inner attr="xxx">

        <A>2-3</A></Outer>
--------------------------------------
<?xml version="1.0" encoding="utf-8"?>
<Outer>
    <Inner attr="xxx">

    </Inner>
    <Inner>
        <A>
            <B attr="AA"/>
            <C>
                <D Dattr="Value"/>
            </C>
        </A></Outer>
--------------------------------------

If you look at the output returned you can see that in several instances the process gets confused.

BertCatsburg avatar Aug 09 '21 11:08 BertCatsburg