ml icon indicating copy to clipboard operation
ml copied to clipboard

Create terms glossary for sourced.ml

Open zurk opened this issue 6 years ago • 3 comments

We constantly confuse terms, so what to say about other developers. I do not want to make it full, but to have a start.

Here is terms list to explain on the first iteration:

  1. Bag-of-words
  2. Weighted bag-of-words
  3. Model
  4. Algorithm
  5. Transformer
  6. Document
  7. Features
    1. identifier
    2. token
    3. literal
    4. graphlet

Googleable terms we may comment:

  1. quantization
  2. TF-IDF
  3. topic
  4. co-occurrence matrix

@src-d/machine-learning please take a look and add any confusing terms you remember.

zurk avatar Jun 14 '18 15:06 zurk

If we're gonna define identifiers and token, might as well also add literals, graphlets and also ~quantification~ quantization . I think we could divide the glossary into:

  • terms that mean something more specific then would be usually the case or are vague to start with e.g. model meaning a modelforge model, words in BOW being any feature extracted from a document, document that means a repo/file or function, etc.
  • terms that we use in the same ways it is intended but not be well known. Now of course they have Google, but we might as well drop a couple lines to explain the concept. E.g. COOC, quantization, topics, TFIDDF

r0mainK avatar Jun 14 '18 16:06 r0mainK

Linking to https://github.com/src-d/apollo/blob/master/doc/GLOSSARY.md

vmarkovtsev avatar Jun 14 '18 16:06 vmarkovtsev

Thanks, @r0mainK I update the description.

zurk avatar Jun 15 '18 07:06 zurk