excelize
excelize copied to clipboard
Performance problem with rich text /xl/sharedStrings.xml deduplication
Description
The dedupl operation in SetCellRichText()
is quite CPU intensive when exporting a bunch of cells together (at least O(n²s), where n is the number of cells and s is the size of rich text in each cell):
func (f *File) SetCellRichText(sheet, cell string, runs []RichTextRun) error {
...
for idx, strItem := range sst.SI { // O(n)
if reflect.DeepEqual(strItem, si) { // > O(s)
cellData.T, cellData.V = "s", strconv.Itoa(idx)
return err
}
}
...
}
func main() {
cells := ...
f := excelize.NewFile()
sheetName := f.GetSheetName(0)
for index, cellValue := range cells { // O(n)
cellName := excelize.CoordinatesToCellName(1, index)
if err := f.SetCellRichText(sheetName, cellName, cellValue); err != nil {
...
}
}
}
Steps to reproduce the issue:
Set a number of cells with SetCellRichText()
.
Describe the results you received:
The reflect.DeepEqual()
loop caused high CPU usage. Please see the pprof result for details.
Describe the results you expected:
The CPU usage should be comparable to SetCellStr()
, SetCellDefault()
etc.
Possible solutions:
- Provide an option to switch off the
reflect.DeepEqual()
comparisons, or - Marshal the rich text structure to utilize a hash map or bloom filter instead of brute-force loops to reduce the time complexity to O(n·s). This is my suggested approach, but the process of serialization and hashing the (possibly large) serialized result would both introduce some noticeable overheads.
- Get rid of
SetCellRichText()
and stick to the good oldSetCellStr()
instead where performance matters. In my own case I only need correct line breaks (#976) and do not need fancy rich text.
Output of go version
:
go version go1.17rc1 linux/amd64
Excelize version or commit ID:
v2.4.0 (d42834f3a82cebe6b54fd67b1f7f50582ea287dc)
Thanks for your feedback, I'll consider optimizing for that, but maybe it's taken a while to respond.