twitter-sentiment-analysis
twitter-sentiment-analysis copied to clipboard
Tidy up preprocess.py with pandas
In preprocess_csv
function in preprocess.py
(link), pandas can be used to parse the csv more efficiently and with way less code. The machine I was using while developing the project did not have pandas installed.
I run your codes,it happend [Errno 2] No such file or directory: '../train-processed-freqdist.pkl',can you solve my problem?Thank you
@GongQin721 This is off-topic. Please read the Readme properly.
OK ,thank you very much!
Can you help me with headers of the csv, if any? If not, some idea about the structure of csv would be of great help.
Hi @chaiitanyasangani88
The csv structure is in the Dataset Information section:
We use and compare various different methods for sentiment analysis on tweets (a binary classification problem). The training dataset is expected to be a csv file of type
tweet_id,sentiment,tweet
where thetweet_id
is a unique integer identifying the tweet,sentiment
is either 1 (positive) or 0 (negative), and tweet is the tweet enclosed in "". Similarly, the test dataset is a csv file of type tweet_id,tweet. Please note that csv headers are not expected and should be removed from the training and test datasets.
In your lstm.py code, these '.csv' and '.pkl' files are showed asFREQ_DIST_FILE = '../train-processed-freqdist.pkl' ,TRAIN_PROCESSED_FILE = '../train-processed.csv' and so on. I wonder how can I process these file from 'positive-words.txt' and 'negative-words.txt' in dataset. Could you please help me with problems above?
@Carolinecrl
'positive-words.txt' and 'negative-words.txt' are not the dataset. They're just for the baseline. The dataset is not included in the repo.
in stats.py which csv file should be sent train or test or any another sample (random )one.