startbootstrap icon indicating copy to clipboard operation
startbootstrap copied to clipboard

Have "dynamic" output format comply to RFC 4180?

Open rgayon opened this issue 2 years ago • 3 comments

Description of problem:

Running psteal.py on browser history generates content that is not RFC 4180 valid, as quotes might appear in the URL+title field, without being quoted

2021-09-07T06:36:45.000000+00:00,Last Visited Time,WEBHIST,Chrome History,<SOME URL> "shouldn't have quotes here" [count: 0] Visit from: Type: [LINK - User clicked a link] (URL not typed directly),sqlite/chrome_27_history,OS:/usr/local/google/home/romaing/Google/Chrome/User Data/Default/History,-

Command line and arguments:

'Chrome' is a Windows Chrome directory

psteal.py --source Chrome -w chrome.csv

Source data:

Chrome browser history

Plaso version:

$ psteal.py --version plaso - psteal version 20210606

Operating system Plaso is running on:

linux

rgayon avatar Oct 18 '21 14:10 rgayon

https://datatracker.ietf.org/doc/html/rfc4180#page-2

While there are various specifications and implementations for the
   CSV format (for ex. [4], [5], [6] and [7]), there is no formal
   specification in existence, which allows for a wide variety of
   interpretations of CSV files.  This section documents the format that
   seems to be followed by most implementations:

joachimmetz avatar Oct 18 '21 16:10 joachimmetz

Looks like RFC 4180 also requires a CRLF for each row (not just LF).

   1.  Each record is located on a separate line, delimited by a line
       break (CRLF).

joachimmetz avatar Oct 18 '21 16:10 joachimmetz

Spoke with @rgayon issue here is mainly escaping double quotes, less complying to the RFC

joachimmetz avatar Oct 19 '21 13:10 joachimmetz