newsflash icon indicating copy to clipboard operation
newsflash copied to clipboard

Error in quer_tv when not using default filter_network()

Open pssguy opened this issue 7 years ago • 7 comments

When I run this code slightly amended from from the blog post

library(newsflash)
library(ggalt)  
library(hrbrmisc) 
library(DT)
library(plotly)
library(tidyverse)
starts <- seq(as.Date("2015-01-01"), (as.Date("2017-01-26")-30), "30 days") # splitting into 30 day chunks 25
ends <- as.character(starts + 29)
ends[length(ends)] <- ""

pb <- progress_estimated(length(starts))  # from dplyr takes app 1min
emails <- map2(starts, ends, function(x, y) {
  pb$tick()$print()
  query_tv("clinton", "email,emails,server", timespan="custom", start_date=x, end_date=y, filter_network = "AFFNETALL") 
})

This appears in the console

|====                                                                                                        |  4% ~1 s remaining     
No results found
|========                                                                                                    |  8% ~5 s remaining     
No results found
|========================================================                                                    | 52% ~9 s remaining     
Error: lexical error: inside a string, '\' occurs before a character which it may not.
          h! `/xx tt4w`t2n`qt'' mnh! `_\8 tt4w`t2n`qt'' nz(l `-'8 tt4w
                     (right here) ------^
Click for sessionInfo ``` > sessionInfo() R version 3.3.2 (2016-10-31) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows >= 8 x64 (build 9200)

locale: [1] LC_COLLATE=English_Canada.1252 [2] LC_CTYPE=English_Canada.1252
[3] LC_MONETARY=English_Canada.1252 [4] LC_NUMERIC=C
[5] LC_TIME=English_Canada.1252

attached base packages: [1] stats graphics grDevices utils datasets [6] methods base

other attached packages: [1] dplyr_0.5.0 purrr_0.2.2
[3] readr_1.0.0 tidyr_0.6.1
[5] tibble_1.2 tidyverse_1.1.1
[7] plotly_4.5.6.9000 DT_0.2
[9] hrbrmisc_0.2.0 fastmatch_1.1-0
[11] ggalt_0.4.0 ggplot2_2.2.1.9000 [13] newsflash_0.4.2

loaded via a namespace (and not attached): [1] Rcpp_0.12.9 lubridate_1.6.0
[3] lattice_0.20-34 assertthat_0.1
[5] digest_0.6.12 proj4_1.0-8
[7] psych_1.6.12 R6_2.2.0
[9] plyr_1.8.4 httr_1.2.1
[11] seleniumPipes_0.3.7 readxl_0.1.1
[13] lazyeval_0.2.0 curl_2.3
[15] extrafontdb_1.0 whisker_0.3-2
[17] Matrix_1.2-8 devtools_1.12.0
[19] extrafont_0.17 tidytext_0.1.2
[21] stringr_1.2.0 foreign_0.8-67
[23] htmlwidgets_0.8 munsell_0.4.3
[25] broom_0.4.2 modelr_0.1.0
[27] janeaustenr_0.1.4 base64enc_0.1-3
[29] mnormt_1.5-5 htmltools_0.3.5
[31] viridisLite_0.1.3 withr_1.0.2
[33] MASS_7.3-45 SnowballC_0.5.1
[35] grid_3.3.2 txtplot_1.0-3
[37] nlme_3.1-131 jsonlite_1.3
[39] Rttf2pt1_1.3.4 gtable_0.2.0
[41] DBI_0.6 magrittr_1.5
[43] formatR_1.4 scales_0.4.1
[45] tokenizers_0.1.4 KernSmooth_2.23-15 [47] stringi_1.1.2 reshape2_1.4.2
[49] xml2_1.1.1 ash_1.0-15
[51] RColorBrewer_1.1-2 tools_3.3.2
[53] forcats_0.2.0 hms_0.3
[55] maps_3.1.1 parallel_3.3.2
[57] colorspace_1.3-2 rvest_0.3.2
[59] memoise_1.0.0 knitr_1.15.1
[61] haven_1.0.0


pssguy avatar Mar 14 '17 16:03 pssguy

There's an issue with the json returned for the timespan of 2015-12-27 - 2016-01-25 for your query. In other words, GDELT is returning invalid json.

yeedle avatar Mar 14 '17 20:03 yeedle

Beat me to it, @Yeedle ;-) I compensated for some of this with https://github.com/hrbrmstr/newsflash/blob/master/R/newsflash.r#L131 (despite httr using similar methods, some of it's post-processing was causing other data loss) but the API has issues. If you do a similar query on the web site, do you get decent JSON after downloading? If so, I'm going to be almost stumped since this is just calling the same thing their browser clicky bits do.

hrbrmstr avatar Mar 14 '17 20:03 hrbrmstr

OK I tried removing that time-period with

starts <- starts[-13]
ends <- ends[-13]

pb <- progress_estimated(length(starts))  # from dplyr takes app 1min
emails <- map2(starts, ends, function(x, y) {
  pb$tick()$print()
  query_tv("clinton", "email,emails,server", timespan="custom", start_date=x, end_date=y, filter_network = "AFFNETALL") 

|==                                                                  |  4% ~5 m remaining     
No results found
|=====                                                               |  8% ~3 m remaining     
No results found
|====================================================================|100% ~0 s remaining     
> clinton_timeline <- map_df(emails, "timeline") #4836
Error: `x` must be a vector (not a NULL)

newsflasissue

I have not updated the package from yesterday

pssguy avatar Mar 14 '17 21:03 pssguy

hmm, that seems like an issue with map_df. Try clinton_timeline <- map_df(emails, ~.x[["timeline"]]) (I know it should be the same but it worked for me this way.

yeedle avatar Mar 14 '17 21:03 yeedle

Oh, I see now. The issue is that the first two lists in emails are null. Seems like map_df doesn't know how to deal with null lists when it's only provided a character as .f

yeedle avatar Mar 14 '17 21:03 yeedle

@Yeedle Thanks for alternative. It does work for me I'm not that well-informed on purrr so not quite sure what only provided a character as .f means Does this suggest a bug?

pssguy avatar Mar 15 '17 15:03 pssguy

@pssguy Not sure if it's a bug, but to me it's inconsistent behavior, unless there's something I missed about purrr. I filed it as an issue: https://github.com/tidyverse/purrr/issues/306

yeedle avatar Mar 15 '17 15:03 yeedle