intelmq Checking for CSV Injection

Source (read first): http://georgemauer.net/2017/10/07/csv-injection.html

IntelMQ itself and some sending components often send/output CSV data. E.g. intelmq-mailgen and the CSV Converter Expert. And the parsers store data from feeds often 1:1, a leading = in a string is not escaped (and perfectly valid data for arbitrary data). We cannot really prevent this for any data, we already apply useful constraints on the field where useful. I'd focus on the output side here, because nobody want's to send CSV files with malicious content, but it's ok to have such malicious strings in IoC databases.

To prevent unintentional CSV injections, I propose to filter out/escape these constructs by default in

the CSV converter expert
intelmq-mailgen
cert.at's intelmqcli tool

Alternatively we can create a CSV injection sanitizer bot, which would fit more in IntelMQ's concept of "one component should only do one thing" but violates the principle of sane default behavior.

cc @bernhardreiter

Nov 06 '20 19:11 ghost

As IntelMQ mailgen sends out CSV file in some situations and we cannot be sure how the recipient will deal with it, it makes sense to sanitize by default. Formulars are not the usual contents for warning emails.

@wagner-certat Thanks for pointing this out.

Nov 20 '20 14:11 bernhardreiter

This needs some research how many vulnerable CSV spreadsheets are still out there. The article (http://georgemauer.net/2017/10/07/csv-injection.html) from 2017-10 mentions Excel and Google Sheets and thus it is likely that there are still old Excel versions around with this problem. (I've briefly tried LibreOffice 6.1.5.2 10(Build:2) which interpreted the formulr =2-5, but I could not easily find a way to execute SHELL commands in a couple of minutes.)

Nov 20 '20 14:11 bernhardreiter

Using =calc|a!z I got this prompt by Excel 2016: calc (basically asking if calc.exe should be started)

I was not successful with the IMPORTXML function, but that could be a localization issue (I only have a German Excel)

Nov 23 '20 15:11 ghost

In regard the question where in the pipeline the CSV sanitation could be placed, I see three options:

Legit data can start with = or similar signs (such as @ which can also be used for injection), so fixing all data by default (for example in the harmonization classes) is not an option IMO.
If we do it as expert (before sending data out to via mail in CSV files), we follow the one-bot-one-task principle, but then the data in database is already changed and not original. However the data does not cause any harm in the database. Not optimal as well.
Doing the checks in the component which generates the CSV would be most specific and doing it by default there is legit IMHO. But it needs to be done in every tool. A central helper in the intelmq core, which can be called by all tools, would help in this case.

I'm in favor of option 3. But let's first try to better understand the issue.

Nov 23 '20 15:11 ghost

When doing it in the components that generates the CSV (option 3), this is the place where it can be guessed what the purpose of the csv is. And this purpose should be guiding what do to, according to the article of Mr. Mauer. So it makes most sense to me. A good step from my point of view would be to find out about how many components we are looking at.

Nov 24 '20 07:11 bernhardreiter

Additional to the three places I mentioned in the opening post, there's also the CSV download of intelmq-fody. That conversion however is handled by JS in the Frontend, not Python.

Nov 24 '20 19:11 ghost