High memory when writing 1million number of rows
Hello,
I am facing an issue. I am trying to write large amount of data(approx 1million rows) into the excel and saving the excel at the end after writing all the lines. But the memory usage is very high in the case. It is taking more than 1.5GB of memory. I assume as each row stays in memory and all the data is dumped into the file at the end is causing the issue. Do we have any method so that we append rows in excel files without taking so much of memory ?
Please using the StreamWriter for generating a worksheet with huge amounts of data. Note that the streaming API doesn't support append to the existing worksheets currently.
Thanks @xuri for suggesting it but still the memory usage is on higher side. I believe this issue can be resolved only if have some append method for writing it in worksheets. I'll probably investigate more about it.
@viv2793 @xuri I am facing the same issue. When I am trying to write a large amount of data i.e more than a million records, it is taking more than 5 hrs to process. I am not sure about memory usage. Is there any workaround for it? @viv2793 how much time is taking for you to write million of records?
Hi @rakesh-wrx, thanks for your feedback, could you provide any code and attachments to reproduce this issue?
Writing 17,000 rows takes about 7 minutes. In my case it's not a high memory thing (it can use more if it'd be faster), but the performance is really slow. I'm trying to give my users an option of having an XLSX file instead of CSV, so I'm used to writing [][]string.
I can write the same CSV file in 7 seconds what takes 7 minutes with XLSX. Are there things I can do on the data-generation side before passing into the library that will make it more performant?
func writeLineForOutput(line []string, excelLineNum int, lineInterface []interface{}, excelWriter *excelize.StreamWriter) error { lineInterface = lineInterface[:0] for x := range line { lineInterface = append(lineInterface, line[x]) } cell, _ := excelize.CoordinatesToCellName(1, excelLineNum) return excelWriter.SetRow(cell, lineInterface) }
Hi @mzimmerman, thanks for your feedback, could you provide more details about your environment information, how many columns each row, which version of the Go language and excelize library you are using, what hardware and OS info? Please also reference the performance benchmark report.
Looks like there's a lot of garbage created/used in referencing the cell addresses... e.g., A10 vs ZZ56 -- it looks like you're converting to/from that a few times depending on where it's needed. Also a lot done in XML serialization.
git clone https://github.com/mzimmerman/excelizetest
go test -bench=. -cpuprofile cpu.out -memprofile mem.out goos: linux goarch: amd64 pkg: github.com/mzimmerman/excelizetest cpu: Intel(R) Xeon(R) Platinum 8160 CPU @ 2.10GHz BenchmarkExcelize10x10-192 610 1882966 ns/op BenchmarkExcelize100x100-192 82 13992091 ns/op BenchmarkExcelize1000x1000-192 1 1179630242 ns/op BenchmarkExcelize1000x10-192 78 14885501 ns/op BenchmarkExcelize10000x10-192 8 138420402 ns/op BenchmarkExcelize100000x10-192 1 1386176914 ns/op BenchmarkExcelize100000x100-192 1 13480967919 ns/op BenchmarkExcelize10000x1000-192 1 13490364172 ns/op PASS ok github.com/mzimmerman/excelizetest 36.209s
pprof excelizetest.test cpu.out
Total: 37.44s
ROUTINE ======================== github.com/xuri/excelize/v2.(*StreamWriter).SetRow in /home/mzimmerman/go/pkg/mod/github.com/xuri/excelize/[email protected]/stream.go
280ms 16.45s (flat, cum) 43.94% of Total
. . 308:// 'Flush' method to end the streaming writing process.
. . 309://
. . 310:// As a special case, if Cell is used as a value, then the Cell.StyleID will be
. . 311:// applied to that cell.
. . 312:func (sw *StreamWriter) SetRow(axis string, values []interface{}, opts ...RowOpts) error {
. 30ms 313: col, row, err := CellNameToCoordinates(axis)
. . 314: if err != nil {
. . 315: return err
. . 316: }
. . 317: if !sw.sheetWritten {
. . 318: if len(sw.cols) > 0 {
. . 319: _, _ = sw.rawData.WriteString("<cols>" + sw.cols + "</cols>")
. . 320: }
. . 321: _, _ = sw.rawData.WriteString(`<sheetData>`)
. . 322: sw.sheetWritten = true
. . 323: }
. . 324: attrs, err := marshalRowAttrs(opts...)
. . 325: if err != nil {
. . 326: return err
. . 327: }
. 60ms 328: fmt.Fprintf(&sw.rawData, `<row r="%d"%s>`, row, attrs)
90ms 90ms 329: for i, val := range values {
40ms 4.41s 330: axis, err := CoordinatesToCellName(col+i, row)
. . 331: if err != nil {
. . 332: return err
. . 333: }
20ms 80ms 334: c := xlsxC{R: axis}
20ms 20ms 335: if v, ok := val.(Cell); ok {
. . 336: c.S = v.StyleID
. . 337: val = v.Value
. . 338: setCellFormula(&c, v.Formula)
50ms 50ms 339: } else if v, ok := val.(*Cell); ok && v != nil {
. . 340: c.S = v.StyleID
. . 341: val = v.Value
. . 342: setCellFormula(&c, v.Formula)
. . 343: }
20ms 2.03s 344: if err = sw.setCellValFunc(&c, val); err != nil {
. . 345: _, _ = sw.rawData.WriteString(`</row>`)
. . 346: return err
. . 347: }
40ms 9.17s 348: writeCell(&sw.rawData, c)
. . 349: }
. 10ms 350: _, _ = sw.rawData.WriteString(`</row>`)
. 500ms 351: return sw.rawData.Sync()
. . 352:}
. . 353:
. . 354:// marshalRowAttrs prepare attributes of the row by given options.
. . 355:func marshalRowAttrs(opts ...RowOpts) (attrs string, err error) {
. . 356: var opt *RowOpts
Total: 37.44s
ROUTINE ======================== github.com/xuri/excelize/v2.writeCell in /home/mzimmerman/go/pkg/mod/github.com/xuri/excelize/[email protected]/stream.go
140ms 9.01s (flat, cum) 24.07% of Total
. . 481: default:
. . 482: }
. . 483: return
. . 484:}
. . 485:
20ms 20ms 486:func writeCell(buf *bufferedWriter, c xlsxC) {
. 140ms 487: _, _ = buf.WriteString(`<c`)
. . 488: if c.XMLSpace.Value != "" {
. . 489: fmt.Fprintf(buf, ` xml:%s="%s"`, c.XMLSpace.Name.Local, c.XMLSpace.Value)
. . 490: }
10ms 3.23s 491: fmt.Fprintf(buf, ` r="%s"`, c.R)
10ms 10ms 492: if c.S != 0 {
. . 493: fmt.Fprintf(buf, ` s="%d"`, c.S)
. . 494: }
. . 495: if c.T != "" {
50ms 2.85s 496: fmt.Fprintf(buf, ` t="%s"`, c.T)
. . 497: }
10ms 300ms 498: _, _ = buf.WriteString(`>`)
10ms 10ms 499: if c.F != nil {
. . 500: _, _ = buf.WriteString(`<f>`)
. . 501: _ = xml.EscapeText(buf, []byte(c.F.Content))
. . 502: _, _ = buf.WriteString(`</f>`)
. . 503: }
10ms 10ms 504: if c.V != "" {
. 110ms 505: _, _ = buf.WriteString(`<v>`)
. 1.94s 506: _ = xml.EscapeText(buf, []byte(c.V))
. 150ms 507: _, _ = buf.WriteString(`</v>`)
. . 508: }
. 220ms 509: _, _ = buf.WriteString(`</c>`)
20ms 20ms 510:}
. . 511:
. . 512:// Flush ending the streaming writing process.
. . 513:func (sw *StreamWriter) Flush() error {
. . 514: if !sw.sheetWritten {
. . 515: _, _ = sw.rawData.WriteString(`<sheetData>`)```
```(pprof) list CoordinatesToCellName
Total: 37.44s
ROUTINE ======================== github.com/xuri/excelize/v2.CoordinatesToCellName in /home/mzimmerman/go/pkg/mod/github.com/xuri/excelize/[email protected]/lib.go
120ms 4.40s (flat, cum) 11.75% of Total
. . 262:// Example:
. . 263://
. . 264:// excelize.CoordinatesToCellName(1, 1) // returns "A1", nil
. . 265:// excelize.CoordinatesToCellName(1, 1, true) // returns "$A$1", nil
. . 266://
10ms 10ms 267:func CoordinatesToCellName(col, row int, abs ...bool) (string, error) {
. . 268: if col < 1 || row < 1 {
10ms 10ms 269: return "", fmt.Errorf("invalid cell coordinates [%d, %d]", col, row)
. . 270: }
. . 271: sign := ""
. . 272: for _, a := range abs {
. . 273: if a {
. . 274: sign = "$"
. . 275: }
. . 276: }
20ms 2.33s 277: colname, err := ColumnNumberToName(col)
80ms 2.05s 278: return sign + colname + sign + strconv.Itoa(row), err
. . 279:}
. . 280:
. . 281:// areaRefToCoordinates provides a function to convert area reference to a
. . 282:// pair of coordinates.
. . 283:func areaRefToCoordinates(ref string) ([]int, error) {
(pprof) list ColumnNumberToName
Total: 37.44s
ROUTINE ======================== github.com/xuri/excelize/v2.ColumnNumberToName in /home/mzimmerman/go/pkg/mod/github.com/xuri/excelize/[email protected]/lib.go
170ms 2.31s (flat, cum) 6.17% of Total
. . 219://
. . 220:// Example:
. . 221://
. . 222:// excelize.ColumnNumberToName(37) // returns "AK", nil
. . 223://
40ms 40ms 224:func ColumnNumberToName(num int) (string, error) {
10ms 10ms 225: if num < 1 {
. . 226: return "", fmt.Errorf("incorrect column number %d", num)
. . 227: }
10ms 10ms 228: if num > TotalColumns {
. . 229: return "", ErrColumnNumber
. . 230: }
. . 231: var col string
30ms 30ms 232: for num > 0 {
30ms 2.17s 233: col = string(rune((num-1)%26+65)) + col
20ms 20ms 234: num = (num - 1) / 26
. . 235: }
30ms 30ms 236: return col, nil
. . 237:}
. . 238:
. . 239:// CellNameToCoordinates converts alphanumeric cell name to [X, Y] coordinates
. . 240:// or returns an error.
. . 241://```
pprof excelizetest.test mem.out
```(pprof) list writeCell
Total: 3.33GB
ROUTINE ======================== github.com/xuri/excelize/v2.writeCell in /home/mzimmerman/go/pkg/mod/github.com/xuri/excelize/[email protected]/stream.go
950.51MB 1.34GB (flat, cum) 40.16% of Total
. . 482: }
. . 483: return
. . 484:}
. . 485:
. . 486:func writeCell(buf *bufferedWriter, c xlsxC) {
. 24.15MB 487: _, _ = buf.WriteString(`<c`)
. . 488: if c.XMLSpace.Value != "" {
. . 489: fmt.Fprintf(buf, ` xml:%s="%s"`, c.XMLSpace.Name.Local, c.XMLSpace.Value)
. . 490: }
356.01MB 459.51MB 491: fmt.Fprintf(buf, ` r="%s"`, c.R)
. . 492: if c.S != 0 {
. . 493: fmt.Fprintf(buf, ` s="%d"`, c.S)
. . 494: }
. . 495: if c.T != "" {
356.51MB 426.10MB 496: fmt.Fprintf(buf, ` t="%s"`, c.T)
. . 497: }
. . 498: _, _ = buf.WriteString(`>`)
. . 499: if c.F != nil {
. . 500: _, _ = buf.WriteString(`<f>`)
. . 501: _ = xml.EscapeText(buf, []byte(c.F.Content))
. . 502: _, _ = buf.WriteString(`</f>`)
. . 503: }
. . 504: if c.V != "" {
. 59.90MB 505: _, _ = buf.WriteString(`<v>`)
238MB 339.98MB 506: _ = xml.EscapeText(buf, []byte(c.V))
. 20.93MB 507: _, _ = buf.WriteString(`</v>`)
. . 508: }
. 37.23MB 509: _, _ = buf.WriteString(`</c>`)
. . 510:}
. . 511:
. . 512:// Flush ending the streaming writing process.
. . 513:func (sw *StreamWriter) Flush() error {
. . 514: if !sw.sheetWritten {```
I'm not sure how to fix/adjust/help the excelize library -- it supports a lot of additional features/functions that I don't want, but that would still be needed to be supported.
Because of that, I implemented an XLSX writer of [][]string -- doesn't support any other Excel type but is more performant in large file cases. Uses all cores on a machine but is generally still limited to the flate compression process.
https://github.com/mzimmerman/xlsxwriter
BenchmarkExcelize10x10-192 1819354 5708179 +213.75%
BenchmarkExcelize100x100-192 13970914 9574846 -31.47%
BenchmarkExcelize1000x1000-192 1177549846 267765228 -77.26%
BenchmarkExcelize10000x10000-192 125017992200 27324051886 -78.14%
BenchmarkExcelize1000x10-192 14653077 10480947 -28.47%
BenchmarkExcelize10000x10-192 133965033 55374300 -58.67%
BenchmarkExcelize100000x10-192 1363888368 582779774 -57.27%
BenchmarkExcelize100000x100-192 13342011189 2711677473 -79.68%
BenchmarkExcelize10000x1000-192 13136815755 2647669308 -79.85%```