newsflash icon indicating copy to clipboard operation
newsflash copied to clipboard

amazing R tool! can we get intraday timestamps?

Open randomgambit opened this issue 7 years ago • 12 comments

Hello @hrbrmstr, this is great!

I just wonder, is there any possibility to query the data at the intraday level? Or getting any sort of intraday timestamps?

Thanks!

randomgambit avatar Jan 28 '17 20:01 randomgambit

for instance, could you aggregate the counts at the hour-level instead of the daily level? That would help match the data more precisely with data coming from other timezones.

randomgambit avatar Jan 29 '17 01:01 randomgambit

except all I'm doing is calling anytime::anytime(date_start) (etc) and the result of that call is returning only day resolution. lemme look at the raw API values tho

hrbrmstr avatar Jan 29 '17 03:01 hrbrmstr

20161212T000000Z 20161212T235959Z are examples of the start/end times for the timeline structure so you're out of luck there. but 20161221T050000Z is what comes back for show_date in the top_mactchs structure and anytime is not converting that properly so lemme see what i can do for at least that one.

hrbrmstr avatar Jan 29 '17 03:01 hrbrmstr

thanks! if you dont find any workaround, then mailing the guy at GDELT can be a solution I guess

randomgambit avatar Jan 29 '17 03:01 randomgambit

show_date in top_matches should have hms resolution now in 0.3.1 I just pushed. The others don't have such resolution.

dplyr::glimpse(df$top_matches)
## Observations: 1,000
## Variables: 8
## $ preview_url   <chr> "https://archive.org/details/FBC_20161223_140000_Varney__Company#start/...
## $ ia_show_id    <chr> "FBC_20161223_140000_Varney__Company", "CNNW_20161128_180000_Wolf", "FO...
## $ date          <date> 2016-12-23, 2016-11-28, 2016-12-27, 2016-12-23, 2016-12-20, 2016-11-29...
## $ station       <chr> "FOX Business", "CNN", "FOX News", "FOX Business", "FOX Business", "FOX...
## $ show          <chr> "Varney  Company", "Wolf", "FOX  Friends", "Varney  Company", "Making M...
## $ show_date     <dttm> 2016-12-23 14:00:00, 2016-11-28 18:00:00, 2016-12-27 11:00:00, 2016-12...
## $ preview_thumb <chr> "https://archive.org/download/FBC_20161223_140000_Varney__Company/FBC_2...
## $ snippet       <chr> "only at td ameritrade. the berlin terror suspect is debt. what else ha...

hrbrmstr avatar Jan 29 '17 03:01 hrbrmstr

amazing! I am looking at your documentation and I am not sure what top_matches returns for a given request. For instance, If I search for hrbrmstr over 2015, what is then the output of top_matches? The days with the most counts?

randomgambit avatar Jan 29 '17 03:01 randomgambit

That's a gd GDELT/Internet Archive TV search question. I'm assuming (from various testing) that is' the caption text from the top "n" (for large date ranges it maxes at 1K) out of all of the other possible ones it could return. You won't get more than that from the API tho.

hrbrmstr avatar Jan 29 '17 03:01 hrbrmstr

thats great. Thanks again for your help. I ll play a bit with this for a while. But the raw data has to be somewhere, right?

randomgambit avatar Jan 29 '17 03:01 randomgambit

It depends on what GDELT & IA put in their DB. You can clone the code and return the JSON before it gets processed and you'll see that the other structures don't have the resolution you want. Or go to the GUI web interface on their site generate CSVs and JSONs and validate there, too.

hrbrmstr avatar Jan 29 '17 03:01 hrbrmstr

@hrbrmstr coincidence? http://blog.gdeltproject.org/television-explorer-hourly-timeline-boolean-or-and-increased-json-cap/

:D

randomgambit avatar Jan 29 '17 18:01 randomgambit

but as you can see the data can only be downloaded over a 7 days period. It would be amazing if your package could take a date range as an input, break it down into slices of 7 days, download the data for each week and then combine everything into a tibble.

That way would allow everyone to recover the full intraday history. What do you think? Is that doable on your side?

Thanks again!

randomgambit avatar Jan 29 '17 19:01 randomgambit

+100 for the heads' up on their API changes. #ty!!!

Step 1 was making it work with the new API changes ;-) Longer results were causing errors in httr so I had to remove it and use curl. Also, there are issues with the JSON being returned (embedded NULLs) in large result sets so I had to handle that as well.

Rather than have the main function intuit caller intentions, I'll probably add a helper function to do the date breaks as suggested IF they don't change their API again soon (I'll give them some time to let the dust settle on these changes)

hrbrmstr avatar Jan 29 '17 21:01 hrbrmstr