excelize icon indicating copy to clipboard operation
excelize copied to clipboard

Performance problem with rich text /xl/sharedStrings.xml deduplication

Open Arnie97 opened this issue 3 years ago • 1 comments

Description

The dedupl operation in SetCellRichText() is quite CPU intensive when exporting a bunch of cells together (at least O(n²s), where n is the number of cells and s is the size of rich text in each cell):

func (f *File) SetCellRichText(sheet, cell string, runs []RichTextRun) error {
	...
	for idx, strItem := range sst.SI {  // O(n)
		if reflect.DeepEqual(strItem, si) {  // > O(s)
			cellData.T, cellData.V = "s", strconv.Itoa(idx)
			return err
		}
	}
	...
}

func main() {
	cells := ...
	f := excelize.NewFile()
	sheetName := f.GetSheetName(0)
	for index, cellValue := range cells {  // O(n)
		cellName := excelize.CoordinatesToCellName(1, index)
		if err := f.SetCellRichText(sheetName, cellName, cellValue); err != nil {
			...
		}
	}
}

Steps to reproduce the issue:

Set a number of cells with SetCellRichText().

Describe the results you received:

The reflect.DeepEqual() loop caused high CPU usage. Please see the pprof result for details.

Describe the results you expected:

The CPU usage should be comparable to SetCellStr(), SetCellDefault() etc.

Possible solutions:

  • Provide an option to switch off the reflect.DeepEqual() comparisons, or
  • Marshal the rich text structure to utilize a hash map or bloom filter instead of brute-force loops to reduce the time complexity to O(n·s). This is my suggested approach, but the process of serialization and hashing the (possibly large) serialized result would both introduce some noticeable overheads.
  • Get rid of SetCellRichText() and stick to the good old SetCellStr() instead where performance matters. In my own case I only need correct line breaks (#976) and do not need fancy rich text.

Output of go version:

go version go1.17rc1 linux/amd64

Excelize version or commit ID:

v2.4.0 (d42834f3a82cebe6b54fd67b1f7f50582ea287dc)

Arnie97 avatar Aug 02 '21 05:08 Arnie97

Thanks for your feedback, I'll consider optimizing for that, but maybe it's taken a while to respond.

xuri avatar Aug 19 '21 15:08 xuri