Zander
Zander copied to clipboard
Regular expression for matrix information. I.e. parse structured blocks of information from csv or excel files (or similar 2d matrixes)
Zander
data:image/s3,"s3://crabby-images/6d8e9/6d8e9d8fc4a3f4c89098d480030051cfa9fb7114" alt="Build Status"
Named after the fish: Zander. It's a small library to ease with parsing structured blocks of information within a 2-dimensional matrix of information. Typically you get this sort of information from report generators. You might still want to extract this information programmatically, thus the need for the fish.
What problem does this library solve?
When you have data in a structured format, but with different blocks of information. A very simple example is the following:
Report Title | 16/09/15 16:17 | Page: 1 | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Company AB | |||||||||||
Some text | |||||||||||
that goes on and explains the report | |||||||||||
Id | Value | Type | Attribute 1 | Attribute 2 | |||||||
1244 | 25 | A | |||||||||
1244 | 25 | B | 255 | 155 | |||||||
1244 | 25 | C | |||||||||
1250 | 25 | B | 255 | 100 | |||||||
1250 | 25 | C | |||||||||
Report Title | 16/09/15 16:17 | Page: 2 | |||||||||
Company AB | |||||||||||
Some text | |||||||||||
that goes on and explains the report | |||||||||||
Id | Value | Type | Attribute 1 | Attribute 2 | |||||||
1251 | 25 | A | 255 | ||||||||
1251 | 25 | B | 130 | ||||||||
1251 | 25 | C | |||||||||
1260 | 25 | A | |||||||||
1260 | 25 | B | 255 | 15 | |||||||
1260 | 25 | C |
But the structure of the block layout might change from "page" to "page".
How do you match?
Match columns
- Use
_
to indicate that there should be an empty column - Use
"Some constant"
orconstant
to indicate a column with a constant value - Use
@Value
to indicate that you want the value on that column - Use
( .. | .. )
to match any of
Match rows
In order to match rows you supply the row specification with a name by postfixing with : title
If you want the row to match many rows with the same format you add a '+' : : title+
How does it look?
How do you use this library to extract the information above? You use the parser builder:
using Zander;
...
var parsed = new BlockEx( @" _ _ _ _ _ _ ""Report Title"" _ _ _ @Time @Page : report_title
""Company AB"" _ _ _ _ _ _ _ _ _ _ _ : company
@Text _ _ _ _ _ _ _ _ _ _ _ : text+
_ Id _ Value Type _ _ ""Attribute 1"" _ ""Attribute 2"" _ _ : header
_ @Id _ @Value @Type _ _ (@Attribute1|_) _ (@Attribute2|_) _ _ : row+
")
.Matches(arrayOfArrays);
This will give you structured information that will be easy to consume.