code-docstring-corpus icon indicating copy to clipboard operation
code-docstring-corpus copied to clipboard

Question about creating a dataset format for NeuralCodeSum

Open SAMMY-KIM opened this issue 3 years ago • 1 comments

Hello. @Avmb I have a question about dataset format of NeuralCodeSum. When I checked, https://github.com/wasiahmad/NeuralCodeSum/tree/master/data The dataset was from this repositories as you supported the dataset to NeuralCodesum.

Could I know how you make a dataset format for NeuralCodeSum? It was made like token word list without underscore and others. If there is some script to parse code to dataset format or way, I hope to know it.

Thank you:)

SAMMY-KIM avatar Sep 26 '22 12:09 SAMMY-KIM

I'm not sure what processing NerualCodeSum uses, the dataset was created using these scripts: https://github.com/EdinburghNLP/code-docstring-corpus/tree/master/scripts

Avmb avatar Sep 26 '22 12:09 Avmb