datapasta
datapasta copied to clipboard
Line breaks in column headers not handled well
Copying "Double Dissolution Triggers" table here results in
tribble(
~Second.rejection.by,
"the Senate Bill",
"18 June 2014 Clean Energy Finance Corporation (Abolition) Bill 2013 [No. 2]",
"17 August 2015 Fair Work (Registered Organisations) Amendment Bill 2014 [No. 2]",
"18 April 2016 Building and Construction Industry (Consequential and Transitional Provisions) Bill 2013 [No. 2]",
"18 April 2016 Building and Construction Industry (Improving Productivity) Bill 2013 [No. 2]"
)
rather than
tribble(
~`Second rejection by the Senate`, ~Bill,
"18 June 2014", "Clean Energy Finance Corporation (Abolition) Bill 2013 [No. 2]",
"17 August 2015", "Fair Work (Registered Organisations) Amendment Bill 2014 [No. 2]",
"18 April 2016", "Building and Construction Industry (Consequential and Transitional Provisions) Bill 2013 [No. 2]",
"18 April 2016", "Building and Construction Industry (Improving Productivity) Bill 2013 [No. 2]"
)```
This! Ok, so I was about to ask about another Wikipedia table. I noticed many tables separate by spaces or tabs in a way that is conceptually possible to put into a tibble. As a new user to this (amazing) package I find myself doing something like the following in this case:
- datapasta as a dataframe to get
data.frame(
stringsAsFactors = FALSE,
Second.rejection.by = c("the Senate\tBill",
"18 June 2014\tClean Energy Finance Corporation (Abolition) Bill 2013 [No. 2]",
"17 August 2015\tFair Work (Registered Organisations) Amendment Bill 2014 [No. 2]",
"18 April 2016\tBuilding and Construction Industry (Consequential and Transitional Provisions) Bill 2013 [No. 2]",
"18 April 2016\tBuilding and Construction Industry (Improving Productivity) Bill 2013 [No. 2]")
)
- tidy some things up to get
headers <- c("Second rejection by the Senate", "Bill")
theRest <- c("18 June 2014\tClean Energy Finance Corporation (Abolition) Bill 2013 [No. 2]",
"17 August 2015\tFair Work (Registered Organisations) Amendment Bill 2014 [No. 2]",
"18 April 2016\tBuilding and Construction Industry (Consequential and Transitional Provisions) Bill 2013 [No. 2]",
"18 April 2016\tBuilding and Construction Industry (Improving Productivity) Bill 2013 [No. 2]")
- process
theRest
(probably a better way to do it) like this:
splitToken <- '\t' # could be '\\s+' if there are no spaces in the cells for example
tb <- theRest %>% sapply(USE.NAMES = F, function(row){str_split(row, splitToken) %>% first()}) %>% t %>% as.data.frame
names(tb) <- headers # I sometimes make this manually, sometimes get it from the initial datapasta
tb %>% as_tibble()
So maybe there's a way to improve on the above process? I think I will make 3) into a snippet and modify as needed.