code-intelligence
code-intelligence copied to clipboard
[label bot] Embeddings Service should use GraphQL API to fetch issue data
Right now the embedding code is using BeautifulSoup to fetch and extract title and body from a GitHub issue. https://github.com/kubeflow/code-intelligence/blob/9bbdce34fc0d81bfb9a63493941763771d2a0746/py/code_intelligence/embeddings.py#L36
I'm noticing that these leads to slight discrepancies between how whitespace is encoded in the resulting body compared to the data we get via the GraphQL API and/or BigQuery.
As an example consider the issue: tps://github.com/kubeflow/katib/issues/1062
Here's the body returned using GraphQL
kind feature\r\n\r\nKatib should have functionality to save Suggestion state somewhere besides Suggestion pod. \r\nSome users would like to resume Experiments, but they don't want to have always running Suggestion deployment. For example we can use PV.\r\n\r\nWe can use `ResumeExperiment` flag from here: https://github.com/kubeflow/katib/issues/1061 to specify resuming experiment mechanism.\r\n\r\n/cc @johnugeorge @gaocegege @hougangliu @richardsliu \r\n
Here's the value returned by get_issue_text
"/kind feature\nKatib should have functionality to save Suggestion state somewhere besides Suggestion pod.\nSome users would like to resume Experiments, but they don't want to have always running Suggestion deployment. For example we can use PV.\nWe can use ResumeExperiment flag from here: #1061 to specify resuming experiment mechanism.\n/cc @johnugeorge @gaocegege @hougangliu @richardsliu
So the whitespace is encoded slightly differently.
Ideally this shouldn't matter because even if the embeddings are different because the whitespace is different arguably the network should still learn to be invariant to these types of perturbations.
Issue-Label Bot is automatically applying the labels:
Label | Probability |
---|---|
kind/feature | 0.69 |
Please mark this comment with :thumbsup: or :thumbsdown: to give our bot feedback! Links: app homepage, dashboard and code for this bot.