validate-xml-rust
validate-xml-rust copied to clipboard
Validate XML files against their referenced XML Schemas concurrently and fast
Validate XML against XML Schema using Rust
Prerequisites
Cargo for Rust is required.
libxml2 needs to be installed.
Installation
$ cargo install --path .
will install validate-xml into $HOME/.cargo/bin.
Usage
Basic usage:
$ validate-xml root_dir 2> log.txt
Detailed usage:
$ validate-xml --help
Validate XML files concurrently and downloading remote XML Schemas only once.
Usage:
validate-xml [--extension=<extension>] <dir>
validate-xml (-h | --help)
validate-xml --version
Options:
-h --help Show this screen.
--version Show version.
--extension=<extension> File extension of XML files [default: cmdi].
Performance
This was written to be super fast:
- Concurrency, using number of cores available.
- Use of C FFI with
libxml2library. - Downloading of remote XML Schema files only once per run, and caching to memory.
- Caching of
libxml2schema data structure for reuse in multiple concurrent parses.
On my machine, it takes a few seconds to validate a sample set of 20,000 XML files. This is hundreds of times faster than the first attempt, which was a shell script that sequentially runs xmllint (after first pre-downloading remote XML Schema files and passing the local files to xmllint with --schema). The concurrency is a big win, as is the reuse of the libxml2 schema data structure across threads and avoiding having to spawn xmllint as a heavyweight process.
Comparison
If I had known about Python's lxml binding to C libxml2, I might not have bothered this Rust program. I found out about it and wrote the Python version of this program, which looks very similar (but without all the types): https://github.com/FranklinChen/validate-xml-python
However, as the Rust ecosystem fills in gaps in available libraries, I think I would actually be pretty happy to write in Rust instead of Python.