sds2019 icon indicating copy to clipboard operation
sds2019 copied to clipboard

About the Connector Class

Open Choptdei opened this issue 5 years ago • 8 comments

Hi!

We are multiple people in my team using the requests module to collect tweets from Twitter API. We might use several computers to make our requests to the API. As of right now we will end up with several log-files. Is that a problem? And should we hand in our log-files?

Best regards

Choptdei avatar Aug 23 '19 15:08 Choptdei

That is fine, you should just merge the files for the analysis. It would be good to hand in the logs also, but most important thing is to include an analysis of the log in the Project.

snorreralund avatar Aug 25 '19 16:08 snorreralund

Hello Snorre

Thank you. We having another problem. The Twitter API demands a special format when requesting data. It looks like this:

response = requests.post(endpoint,data=data,headers=headers)

How do we use the Connector Class with a post request? We get a error when using it with the Twitter API.

Best regards

Choptdei avatar Aug 25 '19 17:08 Choptdei

@snorreralund regarding the log file. Should we include an analysis of the log with all the connections we have made during our project or should we, at the end of the project, reset the log and run the whole code to get the logs during the final data collection and analyse that log.

BjornCilleborg avatar Aug 26 '19 07:08 BjornCilleborg

Okay, I just added a post method in the connector to the following version: https://github.com/snorreralund/scraping_seminar/blob/master/logging_requests.py

API is slightly changed. Instead of providing a url. Provide a dictionary of arguments to the requests.get method or requests.post method.

e.g.

define auth method

load keys and secrets

import pickle consumer_key, consumer_secret, oauth_token, oauth_token_secret = pickle.load(open('twitter_credentials.pkl','rb')) auth = OAuth1(consumer_key, consumer_secret, oauth_token, oauth_token_secret)

define query

q = 'https://api.twitter.com/1.1/statuses/user_timeline.json? screen_name=realdonaldtrump&count=200&tweet_mode=extended'

connector.get({'url':q,'auth':auth}

snorreralund avatar Aug 27 '19 10:08 snorreralund

Regarding the log you should, report the log that generated the dataset that you analyze, not necessarily your process, testcalls etc.

snorreralund avatar Aug 27 '19 10:08 snorreralund

@snorreralund can you make your changes in this repo? https://github.com/elben10/ScrapingClass that would make it easy for Jakob to push an updated version to pypi.

kristianolesenlarsen avatar Aug 27 '19 10:08 kristianolesenlarsen

just asked Jakob @elben10 to do it.

snorreralund avatar Aug 27 '19 11:08 snorreralund

see issue #41

snorreralund avatar Aug 28 '19 09:08 snorreralund