spark-netflow icon indicating copy to clipboard operation
spark-netflow copied to clipboard

Test files for version 9

Open sadikovi opened this issue 7 years ago • 8 comments

netflow version 9 sample file.

nfcapd.201801311702.gz

sadikovi avatar Jan 31 '18 04:01 sadikovi

Thank you!

2 questions:

a. if you have test data that is not labeled as to the flow version (i.e. v5 or v7) is there a way to determine the version using an easy available tool such as your library or nfdump?

b. do you have test data for v5 or v7 available?

Thanks again

natedogs911 avatar Jan 31 '18 20:01 natedogs911

Hi,

Yes, each file encodes version in the header, including version 9. The package checks magic bytes and version number to make sure that we are reading files consistently in Spark - package does not rely on a file name.

I do have test files that are used in unit tests (see https://github.com/sadikovi/spark-netflow/tree/master/src/test/resources/correct). They are generated files, For manual quality testing I have real-world dataset locally.

sadikovi avatar Jan 31 '18 20:01 sadikovi

You can use my library to check version. Unfortunately, it will check only version 5 and version 7, any other version will throw exception (I think). See example (https://github.com/sadikovi/spark-netflow#using-netflowlib-library-separately) for more information.

sadikovi avatar Jan 31 '18 20:01 sadikovi

thank you, the samples will be helpful. I was just looking at getHeader() to see if I can identify why the test files I have are throwing "bad magic".

natedogs911 avatar Jan 31 '18 20:01 natedogs911

All I can say that the file is most likely not a version 5 or version 7. If you are convinced that your files are version 5 or version 7, which you can do by removing the magic check in the library and try again. Magic numbers are for Cisco Netflow. If you had your files generated using something else, then, I assume, magic will be different.

I think we might need to remove the magic check, or add list of magic that is supported by the package.

You can attach your file, I can have a look.

sadikovi avatar Jan 31 '18 20:01 sadikovi

I just zipped one of the smallest files. These are synthetically generated files and I suspect the header is the problem. I'll try looking at the header shortly as well. nfcapd.201601280215.zip

natedogs911 avatar Jan 31 '18 20:01 natedogs911

I get similar byte layout as for the file I included in the issue, so I guess it is version 9. The package currently does not support version 9.

sadikovi avatar Jan 31 '18 21:01 sadikovi

It looks like those files are nfdump specific, not Cisco NetFlow. I tried successfully parsing them following structs in https://github.com/pmorch/nfdump/blob/621674bc751437741ca367b7c7b170fca6106764/bin/nffile.h.

Code needs to be written specifically to handle those types of files.

sadikovi avatar Feb 02 '18 00:02 sadikovi