ml
ml copied to clipboard
Create terms glossary for sourced.ml
We constantly confuse terms, so what to say about other developers. I do not want to make it full, but to have a start.
Here is terms list to explain on the first iteration:
- Bag-of-words
- Weighted bag-of-words
- Model
- Algorithm
- Transformer
- Document
- Features
- identifier
- token
- literal
- graphlet
Googleable terms we may comment:
- quantization
- TF-IDF
- topic
- co-occurrence matrix
@src-d/machine-learning please take a look and add any confusing terms you remember.
If we're gonna define identifiers and token, might as well also add literals, graphlets and also ~quantification~ quantization . I think we could divide the glossary into:
- terms that mean something more specific then would be usually the case or are vague to start with e.g. model meaning a modelforge model, words in BOW being any feature extracted from a document, document that means a repo/file or function, etc.
- terms that we use in the same ways it is intended but not be well known. Now of course they have Google, but we might as well drop a couple lines to explain the concept. E.g. COOC, quantization, topics, TFIDDF
Linking to https://github.com/src-d/apollo/blob/master/doc/GLOSSARY.md
Thanks, @r0mainK I update the description.