Devign
Devign copied to clipboard
Devign - Implementation
In this repository, we provide lightweight implementation of Devign: Effective Vulnerability Identification by Learning Comprehensive Program Semantics via Graph Neural Networks.
Requirements
- Python=3.6
- Pytorch==1.4.0
- Deep Graph Library
Usage
python main.py \
--dataset <name_of_the_dataset> \
--input_dir <directory_of_the_input>;
Datset
The input_dir should contain three json files namely
train_GGNNinput.jsonvalid_GGNNinput.jsontest_GGNNinput.json
Each json file should contain a list of json object of the following structure
{
'node_features': <A list of features representing every nodes in the graph>,
'graph': <A list of edges>
'target': <0 or 1 representing the vulnerability>
}
-
Let's assume
nnodes in the graph are indexed as0ton-1. The length ofnode_featureslist should ben. Each feature vector should be 100 elements long. Thus thenode_featureslist should be a 2D list of shape(n, 100). -
The length of
graphlist should be the number of the edges. Each edge should be represented as a three element tuple[source, edge_type, destination]. Where thesourceanddestinationsare indices of corresponding node innode_featureslist. Edge types should be from0tomax_edge_types.
Note
- In this implementation, we followed Devign's paper. We could NOT recreate the result in the original paper though.
Reference
[1] Zhou, Yaqin, et al. "Devign: Effective vulnerability identification by learning comprehensive program semantics via graph neural networks." arXiv preprint arXiv:1909.03496 (2019).