MalGraph
MalGraph copied to clipboard
Help wanted
Hello, I noticed that the file train_external_function_name_vocab.jsonl is needed when training the model. Is it generated through the 1BuildExternalVocab.py ? If yes, can you please provide this file; if not, can you tell me some details about generating the file train_external_function_name_vocab.jsonl. Thank you
As we have described in Section IV.A.2)
For each node representing the external function in FCG, it is one-hot encoded based on its function name and we limit the vocabulary size of external functions to 10,000 that are most frequently used in the training dataset.
Therefore, therefore it is quite easy to obtain the file of train_external_function_name_vocab.jsonl
by simply counting frequency of external function names in all samples in the training set and then ranking/select TOP-K external function names.