openxlsx
openxlsx copied to clipboard
Function to copy sheets between workbook objects
When processing the construction of several sheets in parallel, I ended up with a list of workbook objects containing one heavily formatted sheet each. Apparently there is no way to combine all those sheets into one workbook object.
I propose a addSheetFromWorkbook() function, that does just that - or any other way of concatenating sheets from workbook objects.
Current alternatives involve either losing all sheet information beyond the raw cell data, combining the sheets manually or using an external VBA script, which involves additional complications when you are on UNIX-based systems.
A new function as you proposed seems reasonable.
It would be great if you could provide an pull request.
I would love to see this function!
Yes, please, this enhancement would be very valuable. Thanks.
This would be great. Would it be possible to reuse some of the cloneWorksheet code to speed up the development of this enhancement request? Seem like that is very similar use case, except now you are talking about two workbooks.
Is this function available in the latest version?
@williFiebranz in lieu of this function being made available, how were you able to combine manually? Let's say i load two workbooks, containing one sheet each, each having unique formatting, etc. How did you manage to combine those sheets into a merged workbook? I have a project i'm working on and i'm about to throw in the towel and just do this in vba, or stand up a windows instance of R and use COM objects.
@cryptobiotic sorry, I never managed to find a solution. My case was about using parallelisation for performance improvements - which I have been able to gain in a different way. I your case on the other hand I could only imagine a function that re-constructs individual sheets from the ground up based on a set of given imported workbooks.
Had a quick look and inserted the following into wrapper.R
. Requires a roxygen run to add the function to the NAMESPACE
. If you want this integrated you could clean it further, document it, write tests and provide it as pull request? It works for this case, but was not really tested and might have some quirks further down the road and this was my final contribution before the new year :smile:
Just to clarify, this was not newly written. It is the cloneWorksheet
code modified for copying between worksheets.
> library(openxlsx)
>
> m1 <- matrix(1:4, 2, 2)
> m2 <- t(m1)
>
> wb <- createWorkbook()
> addWorksheet(wb, "One")
> writeData(wb, startCol = 1, startRow = 1, x = m1, sheet = "One")
>
> wb
A Workbook object.
Worksheets:
Sheet 1: "One"
Worksheet write order: 1>
> wb2 <- createWorkbook()
> addWorksheet(wb2, "One")
> writeData(wb2, startCol = 1, startRow = 1, x = m2, sheet = "One")
>
> wb2
A Workbook object.
Worksheets:
Sheet 1: "One"
Worksheet write order: 1>
> copyWorksheet(wb, wb2, sheetName = "Two", clonedSheet = "One")
>
> wb2
A Workbook object.
Worksheets:
Sheet 1: "One"
Sheet 2: "Two"
Worksheet write order: 1, 2>
> saveWorkbook(wb2, "/tmp/test.xlsx", overwrite = TRUE)
#' @export
copyWorksheet <- function(from, to, sheetName, clonedSheet) {
if (!"Workbook" %in% class(wb)) {
stop("First argument must be a Workbook.")
}
if (tolower(sheetName) %in% tolower(wb$sheet_names)) {
stop("A worksheet by that name already exists! Sheet names must be unique case-insensitive.")
}
if (nchar(sheetName) > 31) {
stop("sheetName too long! Max length is 31 characters.")
}
if (!is.character(sheetName)) {
sheetName <- as.character(sheetName)
}
## Invalid XML characters
sheetName <- replaceIllegalCharacters(sheetName)
clonedSheet <- from$validateSheet(clonedSheet)
if (!missing(sheetName)) {
if (grepl(pattern = ":", x = sheetName)) {
stop("colon not allowed in sheet names in Excel")
}
}
newSheetIndex <- length(to$worksheets) + 1L
if (newSheetIndex > 1) {
sheetId <-
max(as.integer(regmatches(
to$workbook$sheets,
regexpr('(?<=sheetId=")[0-9]+', to$workbook$sheets, perl = TRUE)
))) + 1L
} else {
sheetId <- 1
}
## copy visibility from cloned sheet!
visible <-
regmatches(
to$workbook$sheets[[clonedSheet]],
regexpr('(?<=state=")[^"]+', to$workbook$sheets[[clonedSheet]], perl = TRUE)
)
## Add sheet to workbook.xml
to$workbook$sheets <-
c(
to$workbook$sheets,
sprintf(
'<sheet name="%s" sheetId="%s" state="%s" r:id="rId%s"/>',
sheetName,
sheetId,
visible,
newSheetIndex
)
)
## append to worksheets list
to$worksheets <-
append(to$worksheets, from$worksheets[[clonedSheet]]$copy())
## update content_tyes
## add a drawing.xml for the worksheet
to$Content_Types <-
c(
to$Content_Types,
sprintf(
'<Override PartName="/xl/worksheets/sheet%s.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.worksheet+xml"/>',
newSheetIndex
),
sprintf(
'<Override PartName="/xl/drawings/drawing%s.xml" ContentType="application/vnd.openxmlformats-officedocument.drawing+xml"/>',
newSheetIndex
)
)
## Update xl/rels
to$workbook.xml.rels <- c(
to$workbook.xml.rels,
sprintf(
'<Relationship Id="rId0" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/worksheet" Target="worksheets/sheet%s.xml"/>',
newSheetIndex
)
)
## create sheet.rels to simplify id assignment
to$worksheets_rels[[newSheetIndex]] <-
genBaseSheetRels(newSheetIndex)
to$drawings_rels[[newSheetIndex]] <- to$drawings_rels[[clonedSheet]]
# give each chart its own filename (images can re-use the same file, but charts can't)
to$drawings_rels[[newSheetIndex]] <-
sapply(to$drawings_rels[[newSheetIndex]], function(rl) {
chartfiles <-
regmatches(
rl,
gregexpr("(?<=charts/)chart[0-9]+\\.xml", rl, perl = TRUE)
)[[1]]
for (cf in chartfiles) {
chartid <- length(charts) + 1
newname <- stri_join("chart", chartid, ".xml")
fl <- charts[cf]
# Read the chartfile and adjust all formulas to point to the new
# sheet name instead of the clone source
# The result is saved to a new chart xml file
newfl <- file.path(dirname(fl), newname)
charts[newname] <- newfl
chart <- readUTF8(fl)
chart <-
gsub(
stri_join("(?<=')", sheet_names[[clonedSheet]], "(?='!)"),
stri_join("'", sheetName, "'"),
chart,
perl = TRUE
)
chart <-
gsub(
stri_join("(?<=[^A-Za-z0-9])", sheet_names[[clonedSheet]], "(?=!)"),
stri_join("'", sheetName, "'"),
chart,
perl = TRUE
)
writeLines(chart, newfl)
# file.copy(fl, newfl)
Content_Types <-
c(
Content_Types,
sprintf(
'<Override PartName="/xl/charts/%s" ContentType="application/vnd.openxmlformats-officedocument.drawingml.chart+xml"/>',
newname
)
)
rl <- gsub(stri_join("(?<=charts/)", cf), newname, rl, perl = TRUE)
}
rl
}, USE.NAMES = FALSE)
# The IDs in the drawings array are sheet-specific, so within the new cloned sheet
# the same IDs can be used => no need to modify drawings
to$drawings[[newSheetIndex]] <- to$drawings[[clonedSheet]]
to$vml_rels[[newSheetIndex]] <- to$vml_rels[[clonedSheet]]
to$vml[[newSheetIndex]] <- to$vml[[clonedSheet]]
to$isChartSheet[[newSheetIndex]] <- to$isChartSheet[[clonedSheet]]
to$comments[[newSheetIndex]] <- to$comments[[clonedSheet]]
to$rowHeights[[newSheetIndex]] <- to$rowHeights[[clonedSheet]]
to$colWidths[[newSheetIndex]] <- to$colWidths[[clonedSheet]]
to$colOutlineLevels[[newSheetIndex]] <- to$colOutlineLevels[[clonedSheet]]
to$outlineLevels[[newSheetIndex]] <- to$outlineLevels[[clonedSheet]]
to$sheetOrder <- c(to$sheetOrder, as.integer(newSheetIndex))
to$sheet_names <- c(to$sheet_names, sheetName)
############################
## STYLE
## ... objects are stored in a global list, so we need to get all styles
## assigned to the cloned sheet and duplicate them
sheetStyles <- Filter(function(s) {
s$sheet == to$sheet_names[[clonedSheet]]
}, to$styleObjects)
to$styleObjects <- c(
to$styleObjects,
Map(function(s) {
s$sheet <- sheetName
s
}, sheetStyles)
)
############################
## TABLES
## ... are stored in the $tables list, with the name and sheet as attr
## and in the worksheets[]$tableParts list. We also need to adjust the
## worksheets_rels and set the content type for the new table
tbls <- to$tables[attr(to$tables, "sheet") == clonedSheet]
for (t in tbls) {
# Extract table name, displayName and ID from the xml
oldname <- regmatches(t, regexpr('(?<= name=")[^"]+', t, perl = TRUE))
olddispname <- regmatches(t, regexpr('(?<= displayName=")[^"]+', t, perl = TRUE))
oldid <- regmatches(t, regexpr('(?<= id=")[^"]+', t, perl = TRUE))
ref <- regmatches(t, regexpr('(?<= ref=")[^"]+', t, perl = TRUE))
# Find new, unused table names by appending _n, where n=1,2,...
n <- 0
while (stri_join(oldname, "_", n) %in% attr(tables, "tableName")) {
n <- n + 1
}
newname <- stri_join(oldname, "_", n)
newdispname <- stri_join(olddispname, "_", n)
newid <- as.character(length(to$tables) + 3L)
# Use the table definition from the cloned sheet and simply replace the names
newt <- t
newt <-
gsub(
stri_join(" name=\"", oldname, "\""),
stri_join(" name=\"", newname, "\""),
newt
)
newt <-
gsub(
stri_join(" displayName=\"", olddispname, "\""),
stri_join(" displayName=\"", newdispname, "\""),
newt
)
newt <-
gsub(
stri_join("(<table [^<]* id=\")", oldid, "\""),
stri_join("\\1", newid, "\""),
newt
)
oldtables <- to$tables
to$tables <- c(oldtables, newt)
names(tables) <- c(names(oldtables), ref)
attr(tables, "sheet") <-
c(attr(oldtables, "sheet"), newSheetIndex)
attr(to$tables, "tableName") <-
c(attr(oldtables, "tableName"), newname)
oldparts <- to$worksheets[[newSheetIndex]]$tableParts
to$worksheets[[newSheetIndex]]$tableParts <-
c(oldparts, sprintf('<tablePart r:id="rId%s"/>', newid))
attr(to$worksheets[[newSheetIndex]]$tableParts, "tableName") <-
c(attr(oldparts, "tableName"), newname)
names(attr(to$worksheets[[newSheetIndex]]$tableParts, "tableName")) <-
c(names(attr(oldparts, "tableName")), ref)
to$Content_Types <-
c(
to$Content_Types,
sprintf(
'<Override PartName="/xl/tables/table%s.xml" ContentType="application/vnd.openxmlformats-officedocument.spreadsheetml.table+xml"/>',
newid
)
)
to$tables.xml.rels <- append(to$tables.xml.rels, "")
to$worksheets_rels[[newSheetIndex]] <-
c(
to$worksheets_rels[[newSheetIndex]],
sprintf(
'<Relationship Id="rId%s" Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/table" Target="../tables/table%s.xml"/>',
newid,
newid
)
)
}
# TODO: The following items are currently NOT copied/duplicated for the cloned sheet:
# - Comments
# - Pivot tables
invisible(newSheetIndex)
}
@JanMarvin great work.
I hope to have some time over the holidays to enable the already existing cloneWorksheet(wb, sheetName, clonedSheet)
function with an optional second wb item as source for copying.
Then the current behaivior should be the same. If only one workbook is provided, then the "old" function is used. But if two or even multiple workbooks as target are provided, then the it would be great if the function you drafted does the work.
cloneWorksheet(wb, # source
target_wb , # optional parameter (one or vector)
sheetName,
clonedSheet # one (recycle) or vector(but the same length as the target_wb)
)
Thanks @ycphs . The code above is just a copy and paste of old code with an added from$
and to$
to be able to select where to copy what. Though the original code is somewhat buried therefore did I copy it entirely.
Happy holidays to you
@JanMarvin this is great, thank you. Your simple two data frame example works as expected. However, when i run this on a set of workbooks with any formatting at all, the resulting combined workbook file is corrupt, and strangely shown as locked by user. THe workbooks i am trying to combine are not complicated, they just have a myriad of formats: bold, borders,fill, etc.
@cryptobiotic Thank you for the confirmation and the report. Maybe some layout information has to be copied over too or has to be combined differently (the old code makes a copy within an existing workbook, therefore it might not have been needed). As a starting point you could have a look, if I made a typo and some from$
in the code above needs to be a to$
(it was a 90 minutes stab at this chunk of code) and it would be helpful if you could provide a simple xlsx file where the issue becomes imminent. It does not have to be a file from your actual work, but one were the behavior can be observed. This will be helpful once @ycphs or myself get back to this issue.
Though, just to let you know, it does not have a high priority for me, as I do not need this, and I do not intend to do any coding for the rest of the year.
@JanMarvin I think i found the issue. I stepped through and made an increasingly complex xlsx file until i found something that repro'd the error. I started with the basic stuff, fonts, underlines, strikethrough, fill, etc. That is fine. I then tried a reference to another sheet, one that was not copied. That throws the standard excel error dialog which is expected. I then created named ranges on one worksheet, referenced those in a dropdown in the sheet i intended to copy. That causes a corrupt excel file to be created. I actually think that makes sense. Unless there was a copy option which pasted all values and formats, but skipped formulas and references, this probably should not be handled by copyWorksheet.
thanks, this will be helpful. in addition you could have a look at the function above e.g. here
# The IDs in the drawings array are sheet-specific, so within the new cloned sheet
# the same IDs can be used => no need to modify drawings
to$drawings[[newSheetIndex]] <- to$drawings[[clonedSheet]]
the right hand side most likely needs to be from
and not to
. since the copied code probably is not yet available. There are a few of these lines in the above code. You could check for these and see if they solve the problem else I'll do it (possibly end of next week).
I have some scratch code from a for loop, to combine multiple workbooks with only one sheet:
#create empty workbook
wbCombine <- openxlsx::createWorkbook()
wbtoCopy <- #workbook to copy
# get Sheet Name to copy
sheetName <- wbtoCopy$sheet_names
# add worksheet to wbCombine
openxlsx::addWorksheet(wbCombine, sheetName)
# read dataframe from worksheet to copy, replacing spaces with in column names
tmpDF <- openxlsx::read.xlsx(wbtoCopy, sheetName, sep.names = "")
# remove special char , from column names
colnames(tmpDF) <- gsub("\\", " ", colnames(tmpDF))
# write dataframe to new sheet
openxlsx::writeData(wbCombine, sheetName, tmpDF)
# combine styleObjects from wbCombine and workbook to copy
wbCombine$styleObjects <- append(wbCombine$styleObjects, wbtoCopy$styleObjects)
# get sheet number of sheet to copy
copySheet <- wbtoCopy$validateSheet(sheetName)
# get sheet number of new sheet
newSheet <- wbCombine$validateSheet(sheetName)
# copy autoFilter from sheet to copy to new sheet
wbCombine$worksheets[[newSheet]]$autoFilter <- wbtoCopy$worksheets[[copySheet]]$autoFilter
maybe this will help someone.
@JanMarvin I made a few replacements but something else is off. Two simple workbooks attached. When i run the copyWorksheet function, with and without the to-from changes you suggested, the data is somehow being corrupted. its almost like data from both is being combined in some cases. Very odd. I will continue to debug it but this may have to wait until after the new year. I appreciate you looking into this.
Interesting side note, a blank workboook, wb, is required even though it is not called explicity anywhere.
wb1 <- loadWorkbook('tester1.xlsx')
wb2 <- loadWorkbook('tester2.xlsx')
wb <- createWorkbook()
copyWorksheet(from = wb2,to = wb1,sheetName = "tester2",clonedSheet = "tester2")
saveWorkbook(wb1,'testing.xlsx',overwrite = T)
@JanMarvin Did you get a chance to look at this? I've played a bit with your code from above and really haven't found a solution yet.
no sorry, I don't have time for this currently. I'm working on another pull request for openxlsx, which is partly about better handling of styles, but can not give any due date
@JanMarvin @williFiebranz bump here, does anyone have time to implement this feature?
no
Not in the near future, but the direction @JanMarvin pointed seems like the way to go.
I totally support the requested feature and I am happy to support a draft submitted via pr.
@ycphs thank you. i wish i could be more helpful in the dev work, but i wasn;t able to get very far
@ycphs were you able to implement the change you suggested to cloneWorksheet? cloneWorksheet(wb, # source target_wb , # optional parameter (one or vector) sheetName, clonedSheet # one (recycle) or vector(but the same length as the target_wb) )
This issue is stale because it has been open 365 days with no activity. Remove stale label or comment or this will be closed in 7 days.
This issue was closed because it has been stalled for 7 days with no activity.