datapasta Line breaks in column headers not handled well

Line breaks in column headers not handled well

Open samclifford opened this issue 7 years ago • 1 comments

Copying "Double Dissolution Triggers" table here results in

tribble(
  ~Second.rejection.by,
  "the Senate	Bill",
  "18 June 2014	Clean Energy Finance Corporation (Abolition) Bill 2013 [No. 2]",
  "17 August 2015	Fair Work (Registered Organisations) Amendment Bill 2014 [No. 2]",
  "18 April 2016	Building and Construction Industry (Consequential and Transitional Provisions) Bill 2013 [No. 2]",
  "18 April 2016	Building and Construction Industry (Improving Productivity) Bill 2013 [No. 2]"
)

rather than

tribble(
  ~`Second rejection by the Senate`, ~Bill,
  "18 June 2014", "Clean Energy Finance Corporation (Abolition) Bill 2013 [No. 2]",
  "17 August 2015", "Fair Work (Registered Organisations) Amendment Bill 2014 [No. 2]",
  "18 April 2016", "Building and Construction Industry (Consequential and Transitional Provisions) Bill 2013 [No. 2]",
  "18 April 2016", "Building and Construction Industry (Improving Productivity) Bill 2013 [No. 2]"
)```

Nov 17 '16 04:11 samclifford

This! Ok, so I was about to ask about another Wikipedia table. I noticed many tables separate by spaces or tabs in a way that is conceptually possible to put into a tibble. As a new user to this (amazing) package I find myself doing something like the following in this case:

datapasta as a dataframe to get

data.frame(
     stringsAsFactors = FALSE,
  Second.rejection.by = c("the Senate\tBill",
                          "18 June 2014\tClean Energy Finance Corporation (Abolition) Bill 2013 [No. 2]",
                          "17 August 2015\tFair Work (Registered Organisations) Amendment Bill 2014 [No. 2]",
                          "18 April 2016\tBuilding and Construction Industry (Consequential and Transitional Provisions) Bill 2013 [No. 2]",
                          "18 April 2016\tBuilding and Construction Industry (Improving Productivity) Bill 2013 [No. 2]")
)

tidy some things up to get

headers <- c("Second rejection by the Senate", "Bill")
theRest <- c("18 June 2014\tClean Energy Finance Corporation (Abolition) Bill 2013 [No. 2]",
                          "17 August 2015\tFair Work (Registered Organisations) Amendment Bill 2014 [No. 2]",
                          "18 April 2016\tBuilding and Construction Industry (Consequential and Transitional Provisions) Bill 2013 [No. 2]",
                          "18 April 2016\tBuilding and Construction Industry (Improving Productivity) Bill 2013 [No. 2]")

process theRest (probably a better way to do it) like this:

splitToken <- '\t' # could be '\\s+' if there are no spaces in the cells for example
tb <- theRest %>% sapply(USE.NAMES = F, function(row){str_split(row, splitToken) %>% first()}) %>% t %>% as.data.frame
names(tb) <- headers  # I sometimes make this manually, sometimes get it from the initial datapasta
tb %>% as_tibble()

So maybe there's a way to improve on the above process? I think I will make 3) into a snippet and modify as needed.

Sep 07 '21 07:09 jfunction

datapasta datapasta copied to clipboard

Line breaks in column headers not handled well

datapasta
datapasta copied to clipboard