fact-extractor
fact-extractor copied to clipboard
produce_labeled_data.py appears to only use Italian stopwords
See produce_labeled_data, line 68:
for diz in val:
# Filter out linked stopwords
if diz['chunk'].lower() in stopwords.StopWords.words('italian'):
continue
I'm not really involved in this project, but I was just skimming through the code and it seems like the hardcoded selection of Italian stopwords might be a bug. Feel free to close this issue if that's not the case.
Thanks for reporting @phdowling ! You are right, the language should be parametrized. I labeled this issue as refactoring
.
@marfox I'd love to do the necessary changes, if you could just elaborate on the details. :-)