Adrien Barbaresi comments

Results 412 comments of


                                            Adrien Barbaresi

docs: add mkdocs page for documentation

Thanks, I agree that moving most of the docs into a separate folder would be better. Considering hosting I'm going to have a look at the links you provided. So...

docs: add mkdocs page for documentation

We could use links to the relevant sections.

docs: add mkdocs page for documentation

The way I see it there would be a small readme file in the future, a reduced version of the current one containing links to additional documentation hosted somewhere else,...

docs: add mkdocs page for documentation

I'd be in favor of sphinx because the [autodoc](https://www.sphinx-doc.org/en/master/usage/extensions/autodoc.html) function will prove useful to automatically reflect changes made to the functions or classes. I'm open to try mkdocs but I...

docs: add mkdocs page for documentation

OK then I think I'll just merge the PR as it is and try to get familiar with the process.

Simple Tokenizer not separating punctuation correctly

Thanks for the feedback! The tokenizer does something slightly different than usually expected: it clusters chars together while segmenting the input. Since the output only consists of lemmata the idea...

Simple Tokenizer not separating punctuation correctly

Yes, your PR has the priority now!

Simple Tokenizer not separating punctuation correctly

Yes, it's faster and simpler. Otherwise you would have to tokenize punctuation accurately (which is a different task) and run the lemmatizer on it (which is useless in the current...

Establish linting and quality tools

Yes, here are some ideas: - we could switch to `mypy --strict` - `flake8` is used to detect obvious mistakes without starting the whole pipeline, do you have another configuration...

Greedy option seems inconsistent

Hi @dysby, good catch! My guess would be that the results are cached internally, which affects the results of `text_lemmatizer()`. In any case it is worth looking further into the...