gocsv Grouping/stacking datasets

Grouping/stacking datasets

Open geekscrapy opened this issue 5 years ago • 1 comments

Feature request: It would be amazing to be able to perform the following analysis with gocsv - unless of course I've missed something!

https://www.fireeye.com/blog/threat-research/2012/11/indepth-data-stacking.html

Dec 10 '19 09:12 geekscrapy

Thanks for the link, I always enjoy a read about aggregating/grouping, forensics, and statistical thinking.

For what I read there's nothing magical/special in the tooling that jumped out at me, except they didn't show any of the tools, so maybe there's some magic in terms of efficiently handling large datasets.

Considering the first example... if the services data looked something like:

big_data.csv

Service Name	Path	Service DLL
Seclogon	system32\svchost.exe	system32\seclogon.dll
Seclogon	system32\svchost.exe	system32\seclogon.dll
Seclogon	system32\svchost.exe	system32\seclogon.dll
... 5595 more	... rows	... like this
Seclogon	system32\svchost.exe	system32\selogon.dll
Seclogon	system32\svchost.exe	system32\selogon.dll
iprip	system32\svchost.exe	system32\iprip.dll
... 5234 more	... rows	... like this
iprip	system32\svchost.exe	system32\iprinp.dll
iprip	system32\svchost.exe	system32\iprinp.dll
iprip	system32\svchost.exe	temp\iprip.dll
iprip	system32\svchost.exe	temp\iprip.dll
iprip	system32\svchost.exe	temp\iprip.dll

This GoCSV pipeline:

gocsv unique  -c Service\ Name,Path,Service\ DLL --count big_data.csv | gocsv select -c Count,Service\ Name,Path,Service\ DLL

would produce a table like:

Count	Service Name	Path	Service DLL
5598	Seclogon	system32\svchost.exe	system32\seclogon.dll
2	Seclogon	system32\svchost.exe	system32\selogon.dll
5235	iprip	system32\svchost.exe	system32\iprip.dll
2	iprip	system32\svchost.exe	system32\iprinp.dll
3	iprip	system32\svchost.exe	temp\iprip.dll

For unique, choose the columns that represent the idea you want to investigate for different-ness. If any single value in those columns in one row is different than another value in the same column in another row, that'll make a unique group and will be counted. And then pass that through select to pare the result down, for visual inspection.

May 25 '21 22:05 zacharysyoung

gocsv gocsv copied to clipboard

Grouping/stacking datasets

gocsv
gocsv copied to clipboard