Askar Bozcan

Results 14 comments of Askar Bozcan

Due to various ways to implement vectorization, with their own pros and cons below I am going to list some of the paths we can take for us to discuss....

Note to self: Either remove spelling correction entirely or revamp it completely with techniques that work better.

Note to self: Modify the symspellpy distance calculation in such a way that changing Turkish umlaut-characters to English counterparts (ü -> u, ç->c) and vice versa (u -> ü, c...

As an extra note, see this: https://towardsdatascience.com/spelling-correction-how-to-make-an-accurate-and-fast-corrector-dc6d0bcbba5f

ToDo: After strict typing is enforced.

Note to self: Tokenizer interface should be easily extendable so that users can add their custom tokenizers if they so desire.

ToDo: More data for supervised ranker summarizers.

Note to self: Should be done in a general way, allowing users to add their own custom preprocessing steps if necessary.