zed icon indicating copy to clipboard operation
zed copied to clipboard

XML reader

Open philrz opened this issue 5 years ago • 3 comments

A community user expressed interest in being able to read XML-format data into Brim/zq.

A data source we're aware of that uses XML is STIX.

philrz avatar Jan 28 '21 01:01 philrz

In the time since this issue was opened, I can see that STIX has moved from using XML in its v1 to JSON in its v2 (link). However, there's still plenty of XML out there in the world so I'm sure this enhancement will still come in handy for some when it can be implemented.

In the meantime, I happened to do a survey of some of the pipeline tools that could help bridge this functional gap in the interim. There's plenty of options, but one that looks quite capable is yq. So for example if we take this STIX 1 Sample Object from the page linked to above and put it in a file stix.xml:

<stix:TTPs>
 <stix:TTP id="attack-pattern:ttp-01" xsi:type='ttp:TTPType'
           version="1.1">
   <ttp:Title>Initial Compromise</ttp:Title>
    <ttp:Behavior>
     <ttp:Attack_Patterns>
      <ttp:Attack_Pattern capec_id="CAPEC-163">
       <ttp:Description>Spear Phishing</ttp:Description>
        </ttp:Attack_Pattern>
      </ttp:Attack_Patterns>
    </ttp:Behavior>
 </stix:TTP>
</stix:TTPs>
<stix:TTPs>
 <stix:Kill_Chains>
  <stixCommon:Kill_Chain id="stix:TTP-02"
                         name="mandiant-attack-lifecycle-model">
  <stixCommon:Kill_Chain_Phase name="initial-compromise"
                               phase_id="stix:TTP-03"/>
 </stix:Kill_Chains>
</stix:TTPs>

We can turn it into JSON and pipe it onward to zq like this:

$ zq -version
Version: v1.14.0-29-g19b2eb5d

$ yq -o=json stix.xml  | zq -Z -
{
    TTPs: {
        TTP: {
            "+@id": "attack-pattern:ttp-01",
            "+@xsi:type": "ttp:TTPType",
            "+@version": "1.1",
            Title: "Initial Compromise",
            Behavior: {
                Attack_Patterns: {
                    Attack_Pattern: {
                        "+@capec_id": "CAPEC-163",
                        Description: "Spear Phishing"
                    }
                }
            }
        }
    }
}

Of course, as many resources touch on, there's enough flexibility in XML that there's no one accepted way to do this conversion, so knobs be investigated by users to suit their needs (e.g., yq's --xml-attribute-prefix comes to mind if the attribute names prepended with +@ is off-putting).

philrz avatar Mar 26 '24 21:03 philrz

A community zync user recently also expressed interest in an XML reader. In their own words:

I was wondering if super would ever be able to ingest XML files? There are lots of XML content out there. I often need to transform XML into JSON to load it. it would require a streaming xml parser to limit its ram usage. The golang community has produced a few, not sure what there worth.

philrz avatar Nov 08 '24 16:11 philrz