openxlsx icon indicating copy to clipboard operation
openxlsx copied to clipboard

Function to copy sheets between workbook objects

Open williFiebranz opened this issue 5 years ago • 23 comments

When processing the construction of several sheets in parallel, I ended up with a list of workbook objects containing one heavily formatted sheet each. Apparently there is no way to combine all those sheets into one workbook object.

I propose a addSheetFromWorkbook() function, that does just that - or any other way of concatenating sheets from workbook objects.

Current alternatives involve either losing all sheet information beyond the raw cell data, combining the sheets manually or using an external VBA script, which involves additional complications when you are on UNIX-based systems.

williFiebranz avatar Jan 15 '20 08:01 williFiebranz

A new function as you proposed seems reasonable.

It would be great if you could provide an pull request.

ycphs avatar Jan 20 '20 18:01 ycphs

I would love to see this function!

CubicTom avatar Apr 28 '20 15:04 CubicTom

Yes, please, this enhancement would be very valuable. Thanks.

CPA-wrk avatar Jun 20 '20 17:06 CPA-wrk

This would be great. Would it be possible to reuse some of the cloneWorksheet code to speed up the development of this enhancement request? Seem like that is very similar use case, except now you are talking about two workbooks.

sclewis23 avatar Jul 25 '20 11:07 sclewis23

Is this function available in the latest version?

hjia222 avatar Oct 27 '20 13:10 hjia222

@williFiebranz in lieu of this function being made available, how were you able to combine manually? Let's say i load two workbooks, containing one sheet each, each having unique formatting, etc. How did you manage to combine those sheets into a merged workbook? I have a project i'm working on and i'm about to throw in the towel and just do this in vba, or stand up a windows instance of R and use COM objects.

cryptobiotic avatar Dec 23 '20 17:12 cryptobiotic

@cryptobiotic sorry, I never managed to find a solution. My case was about using parallelisation for performance improvements - which I have been able to gain in a different way. I your case on the other hand I could only imagine a function that re-constructs individual sheets from the ground up based on a set of given imported workbooks.

williFiebranz avatar Dec 23 '20 17:12 williFiebranz

Had a quick look and inserted the following into wrapper.R. Requires a roxygen run to add the function to the NAMESPACE. If you want this integrated you could clean it further, document it, write tests and provide it as pull request? It works for this case, but was not really tested and might have some quirks further down the road and this was my final contribution before the new year :smile:

Just to clarify, this was not newly written. It is the cloneWorksheet code modified for copying between worksheets.

> library(openxlsx)
> 
> m1 <- matrix(1:4, 2, 2)
> m2 <- t(m1)
> 
> wb <- createWorkbook()
> addWorksheet(wb, "One")
> writeData(wb, startCol = 1, startRow = 1, x = m1, sheet = "One")
> 
> wb
A Workbook object.
 
Worksheets:
 Sheet 1: "One"
 

 
 Worksheet write order: 1> 
> wb2 <- createWorkbook()
> addWorksheet(wb2, "One")
> writeData(wb2, startCol = 1, startRow = 1, x = m2, sheet = "One")
> 
> wb2
A Workbook object.
 
Worksheets:
 Sheet 1: "One"
 

 
 Worksheet write order: 1> 
> copyWorksheet(wb, wb2, sheetName = "Two", clonedSheet = "One")
> 
> wb2
A Workbook object.
 
Worksheets:
 Sheet 1: "One"
 

 Sheet 2: "Two"
 

 
 Worksheet write order: 1, 2> 
> saveWorkbook(wb2, "/tmp/test.xlsx", overwrite = TRUE)

#' @export
copyWorksheet <- function(from, to, sheetName, clonedSheet) {
  
  if (!"Workbook" %in% class(wb)) {
    stop("First argument must be a Workbook.")
  }
  
  if (tolower(sheetName) %in% tolower(wb$sheet_names)) {
    stop("A worksheet by that name already exists! Sheet names must be unique case-insensitive.")
  }
  
  if (nchar(sheetName) > 31) {
    stop("sheetName too long! Max length is 31 characters.")
  }
  
  if (!is.character(sheetName)) {
    sheetName <- as.character(sheetName)
  }
  
  ## Invalid XML characters
  sheetName <- replaceIllegalCharacters(sheetName)
  
  clonedSheet <- from$validateSheet(clonedSheet)
  if (!missing(sheetName)) {
    if (grepl(pattern = ":", x = sheetName)) {
      stop("colon not allowed in sheet names in Excel")
    }
  }
  newSheetIndex <- length(to$worksheets) + 1L
  if (newSheetIndex > 1) {
    sheetId <-
      max(as.integer(regmatches(
        to$workbook$sheets,
        regexpr('(?<=sheetId=")[0-9]+', to$workbook$sheets, perl = TRUE)
      ))) + 1L
  } else {
    sheetId <- 1
  }
  
  
  ## copy visibility from cloned sheet!
  visible <-
    regmatches(
      to$workbook$sheets[[clonedSheet]],
      regexpr('(?<=state=")[^"]+', to$workbook$sheets[[clonedSheet]], perl = TRUE)
    )
  
  ##  Add sheet to workbook.xml
  to$workbook$sheets <-
    c(
      to$workbook$sheets,
      sprintf(
        '<sheet name="%s" sheetId="%s" state="%s" r:id="rId%s"/>',
        sheetName,
        sheetId,
        visible,
        newSheetIndex
      )
    )

    ## append to worksheets list
    to$worksheets <-
      append(to$worksheets, from$worksheets[[clonedSheet]]$copy())


    ## update content_tyes
    ## add a drawing.xml for the worksheet
    to$Content_Types <-
      c(
        to$Content_Types,
        sprintf(
          '<Override PartName="/xl/worksheets/sheet%s.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/>',
          newSheetIndex
        ),
        sprintf(
          '<Override PartName="/xl/drawings/drawing%s.xml" ContentType="application/vnd.openxmlformats-officedocument.drawing+xml"/>',
          newSheetIndex
        )
      )

    ## Update xl/rels
    to$workbook.xml.rels <- c(
      to$workbook.xml.rels,
      sprintf(
        '<Relationship Id="rId0" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet%s.xml"/>',
        newSheetIndex
      )
    )

    ## create sheet.rels to simplify id assignment
    to$worksheets_rels[[newSheetIndex]] <-
      genBaseSheetRels(newSheetIndex)
    to$drawings_rels[[newSheetIndex]] <- to$drawings_rels[[clonedSheet]]

    # give each chart its own filename (images can re-use the same file, but charts can't)
    to$drawings_rels[[newSheetIndex]] <-
      sapply(to$drawings_rels[[newSheetIndex]], function(rl) {
        chartfiles <-
          regmatches(
            rl,
            gregexpr("(?<=charts/)chart[0-9]+\\.xml", rl, perl = TRUE)
          )[[1]]
        for (cf in chartfiles) {
          chartid <- length(charts) + 1
          newname <- stri_join("chart", chartid, ".xml")
          fl <- charts[cf]

          # Read the chartfile and adjust all formulas to point to the new
          # sheet name instead of the clone source
          # The result is saved to a new chart xml file
          newfl <- file.path(dirname(fl), newname)
          charts[newname] <- newfl
          chart <- readUTF8(fl)
          chart <-
            gsub(
              stri_join("(?<=')", sheet_names[[clonedSheet]], "(?='!)"),
              stri_join("'", sheetName, "'"),
              chart,
              perl = TRUE
            )
          chart <-
            gsub(
              stri_join("(?<=[^A-Za-z0-9])", sheet_names[[clonedSheet]], "(?=!)"),
              stri_join("'", sheetName, "'"),
              chart,
              perl = TRUE
            )
          writeLines(chart, newfl)
          # file.copy(fl, newfl)
          Content_Types <-
            c(
              Content_Types,
              sprintf(
                '<Override PartName="/xl/charts/%s" ContentType="application/vnd.openxmlformats-officedocument.drawingml.chart+xml"/>',
                newname
              )
            )
          rl <- gsub(stri_join("(?<=charts/)", cf), newname, rl, perl = TRUE)
        }
        rl
      }, USE.NAMES = FALSE)
    # The IDs in the drawings array are sheet-specific, so within the new cloned sheet
    # the same IDs can be used => no need to modify drawings
    to$drawings[[newSheetIndex]] <- to$drawings[[clonedSheet]]

    to$vml_rels[[newSheetIndex]] <- to$vml_rels[[clonedSheet]]
    to$vml[[newSheetIndex]] <- to$vml[[clonedSheet]]

    to$isChartSheet[[newSheetIndex]] <- to$isChartSheet[[clonedSheet]]
    to$comments[[newSheetIndex]] <- to$comments[[clonedSheet]]

    to$rowHeights[[newSheetIndex]] <- to$rowHeights[[clonedSheet]]
    to$colWidths[[newSheetIndex]] <- to$colWidths[[clonedSheet]]

    to$colOutlineLevels[[newSheetIndex]] <- to$colOutlineLevels[[clonedSheet]]
    to$outlineLevels[[newSheetIndex]] <- to$outlineLevels[[clonedSheet]]

    to$sheetOrder <- c(to$sheetOrder, as.integer(newSheetIndex))
    to$sheet_names <- c(to$sheet_names, sheetName)


    ############################
    ## STYLE
    ## ... objects are stored in a global list, so we need to get all styles
    ## assigned to the cloned sheet and duplicate them
    sheetStyles <- Filter(function(s) {
      s$sheet == to$sheet_names[[clonedSheet]]
    }, to$styleObjects)
    to$styleObjects <- c(
      to$styleObjects,
      Map(function(s) {
        s$sheet <- sheetName
        s
      }, sheetStyles)
    )


    ############################
    ## TABLES
    ## ... are stored in the $tables list, with the name and sheet as attr
    ## and in the worksheets[]$tableParts list. We also need to adjust the
    ## worksheets_rels and set the content type for the new table

    tbls <- to$tables[attr(to$tables, "sheet") == clonedSheet]
    for (t in tbls) {
      # Extract table name, displayName and ID from the xml
      oldname <- regmatches(t, regexpr('(?<= name=")[^"]+', t, perl = TRUE))
      olddispname <- regmatches(t, regexpr('(?<= displayName=")[^"]+', t, perl = TRUE))
      oldid <- regmatches(t, regexpr('(?<= id=")[^"]+', t, perl = TRUE))
      ref <- regmatches(t, regexpr('(?<= ref=")[^"]+', t, perl = TRUE))

      # Find new, unused table names by appending _n, where n=1,2,...
      n <- 0
      while (stri_join(oldname, "_", n) %in% attr(tables, "tableName")) {
        n <- n + 1
      }
      newname <- stri_join(oldname, "_", n)
      newdispname <- stri_join(olddispname, "_", n)
      newid <- as.character(length(to$tables) + 3L)

      # Use the table definition from the cloned sheet and simply replace the names
      newt <- t
      newt <-
        gsub(
          stri_join(" name=\"", oldname, "\""),
          stri_join(" name=\"", newname, "\""),
          newt
        )
      newt <-
        gsub(
          stri_join(" displayName=\"", olddispname, "\""),
          stri_join(" displayName=\"", newdispname, "\""),
          newt
        )
      newt <-
        gsub(
          stri_join("(<table [^<]* id=\")", oldid, "\""),
          stri_join("\\1", newid, "\""),
          newt
        )

      oldtables <- to$tables
      to$tables <- c(oldtables, newt)
      names(tables) <- c(names(oldtables), ref)
      attr(tables, "sheet") <-
        c(attr(oldtables, "sheet"), newSheetIndex)
      attr(to$tables, "tableName") <-
        c(attr(oldtables, "tableName"), newname)

      oldparts <- to$worksheets[[newSheetIndex]]$tableParts
      to$worksheets[[newSheetIndex]]$tableParts <-
        c(oldparts, sprintf('<tablePart r:id="rId%s"/>', newid))
      attr(to$worksheets[[newSheetIndex]]$tableParts, "tableName") <-
        c(attr(oldparts, "tableName"), newname)
      names(attr(to$worksheets[[newSheetIndex]]$tableParts, "tableName")) <-
        c(names(attr(oldparts, "tableName")), ref)

      to$Content_Types <-
        c(
          to$Content_Types,
          sprintf(
            '<Override PartName="/xl/tables/table%s.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.table+xml"/>',
            newid
          )
        )
      to$tables.xml.rels <- append(to$tables.xml.rels, "")

      to$worksheets_rels[[newSheetIndex]] <-
        c(
          to$worksheets_rels[[newSheetIndex]],
          sprintf(
            '<Relationship Id="rId%s" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/table" Target="../tables/table%s.xml"/>',
            newid,
            newid
          )
        )
    }

    # TODO: The following items are currently NOT copied/duplicated for the cloned sheet:
    #   - Comments
    #   - Pivot tables

    invisible(newSheetIndex)
}

JanMarvin avatar Dec 23 '20 19:12 JanMarvin

@JanMarvin great work. I hope to have some time over the holidays to enable the already existing cloneWorksheet(wb, sheetName, clonedSheet) function with an optional second wb item as source for copying.

Then the current behaivior should be the same. If only one workbook is provided, then the "old" function is used. But if two or even multiple workbooks as target are provided, then the it would be great if the function you drafted does the work.

cloneWorksheet(wb, # source
target_wb , # optional parameter (one or vector)
sheetName, 
clonedSheet # one (recycle) or vector(but the same length as the target_wb)
)

ycphs avatar Dec 23 '20 23:12 ycphs

Thanks @ycphs . The code above is just a copy and paste of old code with an added from$ and to$ to be able to select where to copy what. Though the original code is somewhat buried therefore did I copy it entirely. Happy holidays to you

JanMarvin avatar Dec 23 '20 23:12 JanMarvin

@JanMarvin this is great, thank you. Your simple two data frame example works as expected. However, when i run this on a set of workbooks with any formatting at all, the resulting combined workbook file is corrupt, and strangely shown as locked by user. THe workbooks i am trying to combine are not complicated, they just have a myriad of formats: bold, borders,fill, etc.

cryptobiotic avatar Dec 24 '20 19:12 cryptobiotic

@cryptobiotic Thank you for the confirmation and the report. Maybe some layout information has to be copied over too or has to be combined differently (the old code makes a copy within an existing workbook, therefore it might not have been needed). As a starting point you could have a look, if I made a typo and some from$ in the code above needs to be a to$ (it was a 90 minutes stab at this chunk of code) and it would be helpful if you could provide a simple xlsx file where the issue becomes imminent. It does not have to be a file from your actual work, but one were the behavior can be observed. This will be helpful once @ycphs or myself get back to this issue. Though, just to let you know, it does not have a high priority for me, as I do not need this, and I do not intend to do any coding for the rest of the year.

JanMarvin avatar Dec 25 '20 12:12 JanMarvin

@JanMarvin I think i found the issue. I stepped through and made an increasingly complex xlsx file until i found something that repro'd the error. I started with the basic stuff, fonts, underlines, strikethrough, fill, etc. That is fine. I then tried a reference to another sheet, one that was not copied. That throws the standard excel error dialog which is expected. I then created named ranges on one worksheet, referenced those in a dropdown in the sheet i intended to copy. That causes a corrupt excel file to be created. I actually think that makes sense. Unless there was a copy option which pasted all values and formats, but skipped formulas and references, this probably should not be handled by copyWorksheet.

cryptobiotic avatar Dec 28 '20 20:12 cryptobiotic

thanks, this will be helpful. in addition you could have a look at the function above e.g. here

    # The IDs in the drawings array are sheet-specific, so within the new cloned sheet
    # the same IDs can be used => no need to modify drawings
    to$drawings[[newSheetIndex]] <- to$drawings[[clonedSheet]]

the right hand side most likely needs to be from and not to. since the copied code probably is not yet available. There are a few of these lines in the above code. You could check for these and see if they solve the problem else I'll do it (possibly end of next week).

JanMarvin avatar Dec 28 '20 21:12 JanMarvin

I have some scratch code from a for loop, to combine multiple workbooks with only one sheet:

#create empty workbook
wbCombine <- openxlsx::createWorkbook()
wbtoCopy  <- #workbook to copy
# get Sheet Name to copy
sheetName <- wbtoCopy$sheet_names
# add worksheet to wbCombine
openxlsx::addWorksheet(wbCombine, sheetName)
# read dataframe from worksheet to copy, replacing spaces with  in column names
tmpDF <- openxlsx::read.xlsx(wbtoCopy, sheetName, sep.names = "")
# remove special char , from column names
colnames(tmpDF) <- gsub("\\", " ", colnames(tmpDF))
# write dataframe to new sheet
openxlsx::writeData(wbCombine, sheetName, tmpDF)
# combine styleObjects from wbCombine and workbook to copy
wbCombine$styleObjects <- append(wbCombine$styleObjects, wbtoCopy$styleObjects)
# get sheet number of sheet to copy
copySheet <- wbtoCopy$validateSheet(sheetName)
# get sheet number of new sheet
newSheet <- wbCombine$validateSheet(sheetName)
# copy autoFilter from sheet to copy to new sheet
wbCombine$worksheets[[newSheet]]$autoFilter <- wbtoCopy$worksheets[[copySheet]]$autoFilter

maybe this will help someone.

sclewis23 avatar Dec 28 '20 22:12 sclewis23

@JanMarvin I made a few replacements but something else is off. Two simple workbooks attached. When i run the copyWorksheet function, with and without the to-from changes you suggested, the data is somehow being corrupted. its almost like data from both is being combined in some cases. Very odd. I will continue to debug it but this may have to wait until after the new year. I appreciate you looking into this.

Interesting side note, a blank workboook, wb, is required even though it is not called explicity anywhere.

        wb1 <- loadWorkbook('tester1.xlsx')
        wb2 <- loadWorkbook('tester2.xlsx')
        wb <- createWorkbook()
        
        copyWorksheet(from = wb2,to = wb1,sheetName = "tester2",clonedSheet = "tester2")
        saveWorkbook(wb1,'testing.xlsx',overwrite = T)

testing.xlsx tester2.xlsx tester1.xlsx

cryptobiotic avatar Dec 29 '20 17:12 cryptobiotic

@JanMarvin Did you get a chance to look at this? I've played a bit with your code from above and really haven't found a solution yet.

cryptobiotic avatar Jan 19 '21 14:01 cryptobiotic

no sorry, I don't have time for this currently. I'm working on another pull request for openxlsx, which is partly about better handling of styles, but can not give any due date

JanMarvin avatar Jan 19 '21 16:01 JanMarvin

@JanMarvin @williFiebranz bump here, does anyone have time to implement this feature?

cryptobiotic avatar Mar 15 '21 16:03 cryptobiotic

no

JanMarvin avatar Mar 15 '21 16:03 JanMarvin

Not in the near future, but the direction @JanMarvin pointed seems like the way to go.

I totally support the requested feature and I am happy to support a draft submitted via pr.

ycphs avatar Mar 31 '21 17:03 ycphs

@ycphs thank you. i wish i could be more helpful in the dev work, but i wasn;t able to get very far

cryptobiotic avatar Apr 05 '21 17:04 cryptobiotic

@ycphs were you able to implement the change you suggested to cloneWorksheet? cloneWorksheet(wb, # source target_wb , # optional parameter (one or vector) sheetName, clonedSheet # one (recycle) or vector(but the same length as the target_wb) )

cryptobiotic avatar Oct 27 '21 20:10 cryptobiotic

This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 7 days.

github-actions[bot] avatar Oct 29 '22 03:10 github-actions[bot]

This issue was closed because it has been stalled for 7 days with no activity.

github-actions[bot] avatar Nov 12 '22 03:11 github-actions[bot]