djl
djl copied to clipboard
Part of Speech Tagging Dataset
Description
This is a task to add at least one part of speech tagging dataset. These datasets help provide an example of an NLP token classification task, as well as having some use for training multi-purpose NLP models. A good example might be one from Universal Dependencies.
Hi, I wonder if there are any other websites that also include this dataset, since the Penn Tree bank dataset in the Linguistic Data Consortium costs $1700.
That's a good point @AKAGIwyf. I changed it to use a different POS dataset which should be freely available
That's a good point @AKAGIwyf. I changed it to use a different POS dataset which should be freely available
We've found a version of Penn Treebank which is free on github but without POS tags as Torchtext, it had been pre-processed and I've written the code for it. I wonder if you need this kind of dataset whether or not
@AKAGIwyf If you want to add it, more datasets are always good. It provides options for users for which one they want to train. The main goal for this issue was to add at least one POS dataset
Hi @zachgk, I'm interested in this issue and I want to work on it, so I wonder if you can assign it to me? Thanks!
@AKAGIwyf, were you working on this besides the Penn Treebank? @LanAtGitHub is interested in working on this, but I don't want to give it to a second person if you have already started
@AKAGIwyf, were you working on this besides the Penn Treebank? @LanAtGitHub is interested in working on this, but I don't want to give it to a second person if you have already started
Hello @zachgk, I'm a teammate of @AKAGIwyf. We have had a discussion and decided to let me have a try to fix this issue.
Just wanted to make sure. It's assigned @LanAtGitHub
The above PR added a POS dataset.