pinot
pinot copied to clipboard
semi-supervised learning choices
@karalets
Sorry there was some delay. I'm also working on a few other projects.
I'm been looking into ways of semi-supervised learning. The paragraph vector approach in this paper (https://pubs.rsc.org/en/content/articlepdf/2019/sc/c9sc00616h), which is from https://arxiv.org/abs/1711.10168, gives me numerical stability issues since there is the term log(sigmoid(
involved. I switch the dot product measure in that paper to cosine similarity. But found this initialization made the ride even bumpier (https://github.com/choderalab/pinot/tree/master/pinot/app/2020-04-01-171836719500 compared to random initialization https://github.com/choderalab/pinot/tree/master/pinot/app/2020-04-01-120856865376.)
I will continue to explore more semi-supervised algorithms, but at the same time, I think in terms of structure, I didn't find it hard to work with the existing scripts.
I wrote my semi-supervised loss function here (https://github.com/choderalab/pinot/blob/master/pinot/metrics/semi_supervised.py), and produced the weights, which was then used to initialize the supervised learning model by feeding into the --representation_parameter
argument.
What other ways would you recommend to further make things convenient?
I don't have any high-level suggestions.
Reviewing the code you linked https://github.com/choderalab/pinot/blob/6785a4edc1ee2cfcd3ebd8588c8213d517ae7bea/pinot/metrics/semi_supervised.py#L37-L56, I do have a couple low-level comments:
- keyword argument
k
unused (is this important?) - unsure why cosine similarity is used rather just the dot product (what does the normalization do?)
- unsure why a random permutation is used (could you clarify connection between this implementation and expectation appearing in eq 2 of arxiv link? is that the equation we should be looking at?)
@maxentile
k
is in Eq2 of the paper. It was supposed to be a hyperparameter, I dropped it (set it to 1 when trying to get it running). I'll put it back.
The normalization is because, since Eq2 is approximated using the negative sampling trick, and a log(sigmoid(
term is introduced, this will lead to numerical stability issues.
I will look into this over the next couple of days, I was hoping you would start getting results off the shelf for the initial pass so we can work on the infrastructure around the models first.