twitter-sent-dnn icon indicating copy to clipboard operation
twitter-sent-dnn copied to clipboard

Running Sentences One at A Time Gives A Different Result than Batching Them

Open AndersonHappens opened this issue 5 years ago • 0 comments

Running a list of tweets in self.data through sentiment_score one at a time gives different results than batching that same data in 25 or 100 at a time through sentiment_scores_of_sents.

It's not just floating point issues either. I ran a list of 9068 tweets, and I found that the largest difference was 0.9579128974724299, and that 161 tweets in total had a different score in batch than in the single run that were greater than .5!

Code for running them one at a time:

    out = []
    for tweet in self.data:
      out.append(sentiment_score(tweet))

Code for running the data in batches:

    out = []
    for batch in self.batch_data:
      out.extend(sentiment_scores_of_sents(batch))

Code for batching the data:

    temp_list = []
    for x in cls.data:
      if count >= 25:
        cls.batch_data.append(temp_list)
        temp_list = []
        count = 0
        
      temp_list.append(x)
      count += 1
      
    if len(temp_list) > 0:
      cls.batch_data.append(temp_list)

AndersonHappens avatar Mar 29 '19 04:03 AndersonHappens