code2vec icon indicating copy to clipboard operation
code2vec copied to clipboard

How to change the code to work for multi-label classification?

Open allomy opened this issue 4 years ago • 5 comments

I'm trying to use code2vec for multi-label classification, that one sample belongs to several labels, could you give some suggestions what to do with the model?

Thank you in advance for your help!

allomy avatar Nov 15 '21 09:11 allomy

Hi @allomy , Thank you for your interest in code2vec!

I think that you can loss here: https://github.com/tech-srl/code2vec/blob/master/tensorflow_model.py#L228 from the standard cross entropy to sigmoid cross entropy: https://www.tensorflow.org/api_docs/python/tf/compat/v1/nn/sigmoid_cross_entropy_with_logits

But you will also need to change the pipeline to support reading multi-labeled examples. Follow the variable target_index here: https://github.com/tech-srl/code2vec/blob/master/path_context_reader.py and modify it to get a list of targets for every example.

Best, Uri

urialon avatar Nov 15 '21 19:11 urialon

Hi @urialon , thank you for your quick response. I'll try it soon.

allomy avatar Nov 17 '21 02:11 allomy

Hi @urialon , sorry for the delay response that I have tried to modify the code related to target_index, but was lost in the code... Could you give more information about modifying it to get a list of targets for every sample? Thank you in advance for your help.

allomy avatar Dec 03 '21 06:12 allomy

Hi @allomy , Actually it might be easiest for you to use https://code2seq.org/ . It predicts a sequence of labels and not multi-label, but it may either be a good approximation, or easier to adapt for multi-label (just change the loss computation, not the entire data reading pipeline).

Best, Uri

urialon avatar Dec 06 '21 14:12 urialon

Thank you @urialon , I will take a look at code2seq.

allomy avatar Dec 07 '21 02:12 allomy