json2csv icon indicating copy to clipboard operation
json2csv copied to clipboard

Sanitization method for preventing CSV injection

Open ProgramCounterr opened this issue 1 year ago • 6 comments

Issue

I see on OWASP's page on CSV injection to "prepend each cell field with a single quote". I was wondering why sanitization is done by ="<cell_data>" instead. Is there a source that confirms this sanitization method will protect against CSV injection?

To reproduce

Cell field input: =2+5 Expected output: "'=2+5" Actual output: ="=2+5" Code used:

const opts = {
    formatters: {
        string: stringExcelFormatter
    }
};
const parser = new AsyncParser(opts);
parser.parse(data);

Versions

@json2csv/whatwg: 7.06 Chrome: 123.0.6312.107

ProgramCounterr avatar Apr 07 '24 16:04 ProgramCounterr

Hi @ProgramCounterr ,

This is documented at https://juanjodiaz.github.io/json2csv/#/others/known-gotchas?id=avoiding-csv-injection

As stated there and also in the the OWASP page that you shared, CSV injection has nothing to do with the CSV format, since CSV knows nothing about formulas, but with how Excel (and similar spreadsheet software) parses CSV files.

Also looking at that OWASP page, you can see that you can mitigate a CSV attack by

  • Removing risky character (which is near impossible to get full coverage of all possible attack vectors)
  • Santizing strings
    • Wrap each cell field in double quotes
    • Prepend each cell field with a single quote
    • Escape every double quote using an additional double quote

stringExcel sanitizes strings by wrapping them in a function that produces the correct string by wrapping the original string in quotes and skip quotes.

Let's look at the example at the bottom of the OWASP page:

UserId,BillToDate,ProjectName,Description,DurationMinutes
1,2017-07-25,Test Project,Flipped the jibbet,60
2,2017-07-25,Important Client,"Bop, dop, and giglip", 240
2,2017-07-25,Important Client,"=2+5", 240

"=2+5" will look like 7. The stringExcel produces ="""""=2+5""""" which will look like "=2+5", as expected.

Of course, that looks correct on excel and similar software but it won't if you open the CSV in a text editor. But the same happen with with any other sanitization techniques.

Let me know if this is still unclear. Or if you think that I should add more details to the docs.

juanjoDiaz avatar Apr 07 '24 21:04 juanjoDiaz

Hi @juanjoDiaz,

Thanks for the quick response. I understand that CSV injection has to due with how Excel parses CSV files and that after sanitization, it won't look correct on a text editor but will in Excel.

I'm still unclear about why stringExcel sanitizes strings in the way you described instead of sanitizing the string in the way recommended on the OWASP page. Prepending each cell field with a single quote seems to be Excel's recommended method to indicate a cell field should not be a formula: CSVInjection

On another note, I thought stringExcel would turn "=2+5" -> "=""=2+5"""?

The docs do seem a bit outdated (referring to stringExcelFormatter instead of excelString would have been more clear to me). An explanation of how stringExcel sanitizes strings might be good to have on the docs as well.

ProgramCounterr avatar Apr 08 '24 00:04 ProgramCounterr

It seems that MS Excel recommends the apostrophe option. However, do we know if all other spreadsheet editors support that as well?

Also, the apostrophe only covers a small portion of the potential attack vector. Specifically, when the content of your string is a formula.

What if it's just a plain string? Then you'll see an apostrophe that shouldn't be there.

What if the injection is double quotes and a coma (my_data","my_injected_data). Then you'll see an apostrophe that shouldn't be there and you won't prevent the injection.

stringExcel uses the safest option, which is to always wrap the string in a formula that returns just the string correctly escaped.

In any case, the beauty of formatters is that you can do whatever... You want the apostrophe? Sure! Knock yourself out!

function apostropheStringExcel(value: string) {
  return `"=""'${value}"""`;
}

As said, that example won't be safe as you are not escaping quotes, the file separator, the eol, etc... You can, of course, complicate it as much as you want.

juanjoDiaz avatar Apr 08 '24 21:04 juanjoDiaz

Yes, you are right that just an apostrophe won't prevent the injection.

I didn't mean using only the apostrophe to sanitize. I moreso meant using the apostrophe in conjunction with wrapping the cell field in double quotes and escaping every double quote with an additional double quote, as recommended on the OWASP page. I assumed that doing these three things together would be the safest option, so I was wondering why it is not the method used by stringExcel?

Are you saying that the current sanitization method used by stringExcel is used because it is as secure as the one recommended by OWASP as well as preserves the original cell appearance (no extra apostrophe like in OWASP method) in Excel? I just want to be sure that the sanitization method used by stringExcel is at least as secure as the OWASP method.

ProgramCounterr avatar Apr 08 '24 22:04 ProgramCounterr

I don't see any mitigation for injection in this library. I can see appropriate warnings about how users need to consider mitigating CSV injection.

Warnings: https://juanjodiaz.github.io/json2csv/#/others/known-gotchas?id=avoiding-csv-injection

No mention of CSV injection mitigation at formatters string excel, only normalisation: https://juanjodiaz.github.io/json2csv/#/advanced-options/formatters?id=string-excel

The OWASP docs referenced by the warning provide two solutions to prevent Excel from becoming an attack vector. The first solution is fuzzily defined and involves escaping every field value with a lot of caveats and considerations. The second solution involves wrapping every field so that each field is treated only as a non-formula text value in Excel.

The excel string formatter in this library doesn't seem to apply either of the OWASP mitigations, it seems to focus only on producing well readable csvs, when viewed in either Excel or as a flat file. The Excel code throughout seems to concentrate only on making sure that the csv can be read by Excel, which is a challenge in itself.

Making a formatter which implements the first of the OWASP recommendation looks like a lot of work. Implementing the second OWASP recommendation means that all fields become text non-formula values, which is probably not ideal for subsequent data science - it's nice to have numbers as numbers.

Depending on what you want to use your Excel spreadsheet for, you may find that the option 2 formatter is easy to write, but that it might cause a string of secondary issues when you come to use the spreadsheet. Option 1 also hasn't been written in this repo and you would have to write it. It's very easy to add custom formatters.

Does that help at all? What's your use-case?

OWASP mitigation 1:

... attack is difficult to mitigate, and explicitly disallowed from quite a few bug bounty programs. To remediate it, ensure that no cells begin with any of the following characters:

....is not sufficient to make sure that the untrusted user input does not start with these characters. You also need to take care of the field separator (e.g., ‘,’, or ‘;’) and quotes (e.g., ', or "), as attackers could use this to start a new cell and then have the dangerous character in the middle of the user input, but at the beginning of a cell

OWASP mitigation 2:

Alternatively, apply the following sanitization to each field of the CSV, so that their content will be read as text by the spreadsheet editor:

oliverfoster avatar Apr 09 '24 04:04 oliverfoster

Hi guy,

Sorry for the silence.

The stringExcel formatter does exactly the second solution: wrapping every field so that each field is treated only as a non-formula text value in Excel.

As stated in the documentation, you can apply formatters to all the basic types: undefined, boolean, number, bigint, string, symbol, function and object And a special type that only applies to headers: headers

If you only want to avoid injection, you should apply the stringExcel formatter to string adnd that won't affect cell that are number.

I can agree that the documentation would be better. I will look into it when I have some time. But the mitigation for injection is there.

juanjoDiaz avatar Apr 22 '24 19:04 juanjoDiaz