[DOC-META] Text analysis content needed
This meta issue indicates the new pages we need for each of the text analyzers we are currently missing. Note: Language analyzer is documented, and the concepts page: Optimizing text for searches with text analyzers
Analyzers (10)
- [ ] Standard analyzer
- [ ] Simple
- [ ] Whitespace
- [ ] Stop
- [ ] Keyword
- [ ] Pattern
- [ ] Fingerprint
- [ ] Custom
- [ ] Stemming
- [ ] Token graphs
Language analyzers (24)
- [ ] A page for each language - 24 total. See Language analyzer section currently on concepts page.
Tokenizers (14 + index page)
- [ ] Index page
- [ ] Character group
- [ ] Classic
- [ ] Edge n-gram
- [ ] Keyword
- [ ] Letter
- [ ] Lowercase
- [ ] N-gram
- [ ] Path hierarchy
- [ ] Pattern
- [ ] Simple pattern
- [ ] Simple pattern split
- [ ] Standard
- [ ] Thai
- [ ] UAX URL email
- [ ] Whitespace
Token filters (48)
- [ ] Page for each one
Character filters (3 + index page)
- [ ] index page
- [ ] HTML strip
- [ ] Mapping
- [ ] Pattern replace
Normalizers
- [ ] Normalizers
hi @kolchfa-aws , Heather asked me to reassign this ticket to you. Thanks
Missing: Stemming, token graphs, language analyzers for each language, configuring built-in analyzers, custom analyzers, built-in analyzer reference (8 pages), tokenizers, tokenizer reference (15 pages), token filter reference (48 pages), character filter reference (3 pages), normalizers
I'm working on Standard analyzer. cc @hdhalter
I'm going to work on
- Normalizers https://github.com/opensearch-project/documentation-website/pull/8192
- Character filters (3 + index page) https://github.com/opensearch-project/documentation-website/pull/8206