pycorpora icon indicating copy to clipboard operation
pycorpora copied to clipboard

Dashes in category/file names make retrieval difficult

Open serin-delaunay opened this issue 8 years ago • 5 comments

At the moment there are categories in corpora like "film-tv" and files like "materials/abridged-body-fluids" which cannot be accessed using the standard syntax of pycorpora.category_name.file_name['key'], because - is not a legal character in Python identifiers. I can work around this as follows: getattr(pycorpora, 'film-tv').tv_shows['tv_shows'] pycorpora.materials.get_file('abridged-body-fluids')['abridged body fluids'] However, this isn't ideal and probably either pycorpora should perform these workarounds internally (translating - to _, for instance), or corpora should restrict category and file names to valid JS/Python/C (for example) identifiers. I've opened a similar issue in corpora: https://github.com/dariusk/corpora/issues/236.

serin-delaunay avatar Nov 26 '16 17:11 serin-delaunay

Generally speaking, it'd be good to have corpora all nice and consistent, but a great thing about that project is it gets contributions from people who aren't familiar with Git in the first place, which is already quite a hurdle.

So it's probably better to have this tool deal with it.

(It might be an idea to have a guideline to avoid dashes over in corpora. It may be worth converting existing filenames, but then it may break code alway using it. And both those things are unnecessary if these tools deal with it.)

hugovk avatar Nov 26 '16 19:11 hugovk

I merged a fix for this in #9 a few weeks ago, actually. It just hasn't made it to PyPI yet. For now, you can take advantage of the fix by installing directly from github. I'll leave this open until I have a chance to make a new release and close when the fix is generally available.

aparrish avatar Nov 28 '16 16:11 aparrish

@aparrish Is this on PyPI now? I took a look at it but it seems like it's still at 0.1.2 which is from before the change you referenced. But also I'm not 100% sure how to read the versioning and versus the commit log.

dariusk avatar Sep 25 '17 20:09 dariusk

@dariusk not on pypi yet, unfortunately. I'd sorta been waiting until I'd found a good fix for #8 before pushing another pypi release. :(

aparrish avatar Sep 25 '17 20:09 aparrish

Ok!

dariusk avatar Sep 25 '17 20:09 dariusk