Dorukhan Afacan issues

Results 26 issues of


                                            Dorukhan Afacan

Conversion fails with supported type of DictVectorizer

### Bug I wanted to try conversion with various supported mappings on the DictVectorizer [spec](https://github.com/onnx/onnx/blob/master/docs/Operators-ml.md#aionnxmldictvectorizer). Appearently there is support of `map(int64, string)`. However the conversion fails. ### Code ```python _,...

Implement IR based Supervised Sentence Ranker

- This PR includes a retrieval based supervised summarizer implemented using `lightgbm ranker`. - `sadedegel.dataset.annotated` is used with its sentence, relevance pairs to train ranker. - Evaluation is done by...

cleanup-stay

Increase Test Coverage

- Test coverage is on decrease for sometime with every new feature. - A seperate dedicated work is needed to go back and implement tests with: - Rigorous cases on...

cleanup-stay

Populate Docstrings

- Usage and tutorials only help for highlight cases. - Library needs a proper documentation and before that elaborate docstrings for objects and methods. - `numpy` format is chosen. -...

cleanup-stay

Customize Pre-Trained Vectorizer for all HuggingFace Hub models

Currently PreTrainedVectorizer only works for readily available Turkish models at the **huggingface hub**. Extend model name parsing to retrieve people's own custom pre-trained or fine-tuned language models on HF hub.

lowprio

Doc2Vec Tranining CLI

Implement CLI for [Gensim based Doc2Vec](https://radimrehurek.com/gensim/models/doc2vec.html) training just like the one done for Word2Vec. CLI Parameters: - `tokenizer` - `corpus` - `model name` - `epochs` - `DBOW` - `retrain-from`

lowprio

cleanup-remove

Doc2Vec Implementation

After Bert and TfIdf vectors, Word2Vec will come into play. Just as there is TfIdf for document level. Implement a Doc2Vec model with current extended corpus.

enhancement

lowprio

cleanup-remove

Users can use their own tokenizer.

When users: - Opt-out existing `bert` and `simple` tokenizers. - Are not community contributors to add new tokenizer as a functionality to SadedeGel They should be able to feed their...

enhancement

interesting

cleanup-stay

Sentence Polarity Annotation

- I realized while annotating `POSITIVE`, `NEGATIVE` labels for `product_review` corpus, there are distinct sentences that carry positive intent and other with negative intent and some with neutral. - I...

enhancement

help wanted

question

dataset

cleanup-remove

Add Epoch to prebuilt model training flow

- Pre-built models use `partial_fit`. - Each batch in partial fit equates to a `step` in `keras` NN training logic. - Referencing that logic, current training flow, only trains for...

enhancement

lowprio

prebuilt

cleanup-remove