fact-extractor icon indicating copy to clipboard operation
fact-extractor copied to clipboard

produce_labeled_data.py appears to only use Italian stopwords

Open phdowling opened this issue 9 years ago • 3 comments

See produce_labeled_data, line 68:

                    for diz in val:
                        # Filter out linked stopwords
                        if diz['chunk'].lower() in stopwords.StopWords.words('italian'):
                            continue

I'm not really involved in this project, but I was just skimming through the code and it seems like the hardcoded selection of Italian stopwords might be a bug. Feel free to close this issue if that's not the case.

phdowling avatar Dec 15 '15 14:12 phdowling

Thanks for reporting @phdowling ! You are right, the language should be parametrized. I labeled this issue as refactoring.

marfox avatar Dec 15 '15 18:12 marfox

@marfox I'd love to do the necessary changes, if you could just elaborate on the details. :-)

kartiksibal avatar Dec 21 '16 19:12 kartiksibal