pinot icon indicating copy to clipboard operation
pinot copied to clipboard

semi-supervised learning choices

Open yuanqing-wang opened this issue 4 years ago • 3 comments

@karalets

Sorry there was some delay. I'm also working on a few other projects.

I'm been looking into ways of semi-supervised learning. The paragraph vector approach in this paper (https://pubs.rsc.org/en/content/articlepdf/2019/sc/c9sc00616h), which is from https://arxiv.org/abs/1711.10168, gives me numerical stability issues since there is the term log(sigmoid( involved. I switch the dot product measure in that paper to cosine similarity. But found this initialization made the ride even bumpier (https://github.com/choderalab/pinot/tree/master/pinot/app/2020-04-01-171836719500 compared to random initialization https://github.com/choderalab/pinot/tree/master/pinot/app/2020-04-01-120856865376.)

I will continue to explore more semi-supervised algorithms, but at the same time, I think in terms of structure, I didn't find it hard to work with the existing scripts.

I wrote my semi-supervised loss function here (https://github.com/choderalab/pinot/blob/master/pinot/metrics/semi_supervised.py), and produced the weights, which was then used to initialize the supervised learning model by feeding into the --representation_parameter argument.

What other ways would you recommend to further make things convenient?

yuanqing-wang avatar Apr 01 '20 21:04 yuanqing-wang

I don't have any high-level suggestions.

Reviewing the code you linked https://github.com/choderalab/pinot/blob/6785a4edc1ee2cfcd3ebd8588c8213d517ae7bea/pinot/metrics/semi_supervised.py#L37-L56, I do have a couple low-level comments:

  • keyword argument k unused (is this important?)
  • unsure why cosine similarity is used rather just the dot product (what does the normalization do?)
  • unsure why a random permutation is used (could you clarify connection between this implementation and expectation appearing in eq 2 of arxiv link? is that the equation we should be looking at?)

maxentile avatar Apr 02 '20 13:04 maxentile

@maxentile

k is in Eq2 of the paper. It was supposed to be a hyperparameter, I dropped it (set it to 1 when trying to get it running). I'll put it back.

The normalization is because, since Eq2 is approximated using the negative sampling trick, and a log(sigmoid( term is introduced, this will lead to numerical stability issues.

yuanqing-wang avatar Apr 02 '20 14:04 yuanqing-wang

I will look into this over the next couple of days, I was hoping you would start getting results off the shelf for the initial pass so we can work on the infrastructure around the models first.

karalets avatar Apr 10 '20 19:04 karalets