spark-netflow
spark-netflow copied to clipboard
NetFlow data source for Spark SQL and DataFrames
Hi, @sadikovi , I'd like to report a vulnerable dependency in **com.github.sadikovi:spark-netflow_2.12:2.1.0**. ### Issue Description I noticed that **com.github.sadikovi:spark-netflow_2.12:2.1.0** directly depends on **org.apache.spark:spark-core_2.12:3.0.1** in the [pom](https://repo1.maven.org/maven2/com/github/sadikovi/spark-netflow_2.12/2.1.0/spark-netflow_2.12-2.1.0.pom). However, as shown in...
Hi @sadikovi, are you planning to add version 9 support anytime soon?
netflow version 9 sample file. [nfcapd.201801311702.gz](https://github.com/sadikovi/spark-netflow/files/1680378/nfcapd.201801311702.gz)
Aggregation should be flexible, e.g. specifying groupBy and aggregation on numeric columns. Also need to investigate why `flow-tools` drop records when doing report in some cases.
Currently we support predicate resolution based on statistics and if one of the filters is trivial. We should also support situation when filter is `And(GreaterThan(port, 10), GreaterThan(port, 12))`, which should...