dostoevsky
dostoevsky copied to clipboard
Incorrect prediction
from typing import Dict
from dostoevsky.models import FastTextToxicModel
from dostoevsky.tokenization import RegexTokenizer
tokenizer = RegexTokenizer()
toxic_model = FastTextToxicModel(tokenizer=tokenizer)
messages = [ 'привет', 'я люблю тебя!!', 'малолетние дебилы' ]
results = toxic_model.predict(messages, k=2)
for message, sentiment in zip(messages, results):
print(message, '->', sentiment)
Output:
привет -> {'normal': 0.9972950220108032, 'toxic': 0.0026416745968163013} я люблю тебя!! -> {'toxic': 1.0000100135803223, 'normal': 1.0000003385357559e-05} малолетние дебилы -> {'toxic': 1.0000100135803223, 'normal': 1.0000003385357559e-05}
я люблю тебя!! has same toxic value with малолетние дебилы
Hi,
I am aware of this issue. Currently, FastTextToxicModel
is not ready to use.
Thank you for response, i will wait impatiently