stasis icon indicating copy to clipboard operation
stasis copied to clipboard

empty lines

Open adirc opened this issue 9 years ago • 2 comments
trafficstars

There is a lot of empty lines in the gs files - /STS2015-gold/STS.gs.headlines.txt for example. is it means something? or just the label is missing?

adirc avatar Apr 17 '16 18:04 adirc

I had the same question too when I tried to play around the dataset. The empty lines just means that they are not annotated by the STS organizers. Not all sentence pairs are used in the evaluation.

Daniel Cer explains on https://groups.google.com/d/msg/sts-semeval/js-Y0e92YuM/jJUi5beJBwAJ

alvations avatar Apr 17 '16 21:04 alvations

To slurp the STS data into a sframe dataframe, I usually do this:

import sframe
# Reads STS2012-2015 dataset.
sts_train = sframe.SFrame.read_csv('sts.csv', delimiter='\t', column_type_hints=[str, str, float, str, str], quote_char='\0')
# Throw the sentence pairs with empty annotations.
sts_train = sts_train.dropna(columns=['Score'])

Take a look at https://github.com/alvations/stasis/blob/master/notebooks/SWORD.ipynb and https://github.com/alvations/stasis/blob/master/notebooks/SHIELD.ipynb for more details =)

alvations avatar Apr 17 '16 21:04 alvations