preprocessor
preprocessor copied to clipboard
fix hashtag recognition pattern for zwnj
Regarding issue #30, this is my proposition.
Notice that there is ZWNJ (\u200c
) between w
and *
.
Hello @ahangarha, thank you for your report. Would you be willing to add tests for this PR? I'd like to merge it once we make sure it's not breaking existing tests as well as tests for the use case you mentioned. Thanks!
Hello @ahangarha, thank you for your report. Would you be willing to add tests for this PR? I'd like to merge it once we make sure it's not breaking existing tests as well as tests for the use case you mentioned. Thanks!
I am not sure if I can write test. I would love to explore the code and see if I can find out how to do so, but it would take time. I can ask friends to look into the issue and code and see if they can do it. Should I?
Hey @ahangarha, whichever way works for you is fine by me. I quickly checked out your changes. It seems to be breaking some of the tests. If you can add new changes to your PR to fix tests; that'd be great. Basically one of the ways that you can run the tests is as follows:
$ cd preprocessor/
$ python3 -m pytest
And then you should see something like this:
➜ preprocessor git:(master) python3 -m pytest
============================ test session starts =============================
platform darwin -- Python 3.7.6, pytest-5.3.5, py-1.8.1, pluggy-0.13.1
rootdir: /path/to/preprocessor
plugins: hypothesis-5.5.4, arraydiff-0.3, remotedata-0.3.2, openfiles-0.4.0, doctestplus-0.5.0, astropy-header-0.1.2
collected 25 items
tests/test_api.py ........ [ 32%]
tests/test_clean_numbers.py .......... [ 72%]
tests/test_utils.py ....... [100%]
============================= 25 passed in 0.08s =============================
➜ preprocessor git:(master)
Soon I will look into it
Oh! clearly I see a mistake in my regex.
I don't know why I made that mistake. I will fix it soon