twitter-sent-dnn
twitter-sent-dnn copied to clipboard
Running Sentences One at A Time Gives A Different Result than Batching Them
Running a list of tweets in self.data through sentiment_score one at a time gives different results than batching that same data in 25 or 100 at a time through sentiment_scores_of_sents.
It's not just floating point issues either. I ran a list of 9068 tweets, and I found that the largest difference was 0.9579128974724299, and that 161 tweets in total had a different score in batch than in the single run that were greater than .5!
Code for running them one at a time:
out = []
for tweet in self.data:
out.append(sentiment_score(tweet))
Code for running the data in batches:
out = []
for batch in self.batch_data:
out.extend(sentiment_scores_of_sents(batch))
Code for batching the data:
temp_list = []
for x in cls.data:
if count >= 25:
cls.batch_data.append(temp_list)
temp_list = []
count = 0
temp_list.append(x)
count += 1
if len(temp_list) > 0:
cls.batch_data.append(temp_list)