xlsReader icon indicating copy to clipboard operation
xlsReader copied to clipboard

Index out of range

Open dfurmanov opened this issue 4 years ago • 7 comments

Version 0.9.7

Stack trace

panic: runtime error: index out of range [18963] with length 17388
goroutine 75 [running]:
github.com/shakinm/xlsReader/xls/record.(*LabelSSt).GetString(0xc0004033a0, 0xc15b6c, 0x10)
	/gocode/pkg/mod/github.com/shakinm/[email protected]/xls/record/labelSst.go:43 +0x70
github.com/shakinm/xlsReader/xls/record.(*Format).GetFormatString(0xc0000aaa20, 0x1043a20, 0xc0004033a0, 0x16d6740, 0x1)
	/gocode/pkg/mod/github.com/shakinm/[email protected]/xls/record/format.go:125 +0xa85
github.com/dfurmanov/myapp/officetotxt.XLStoCSV(0x102dee0, 0xc0003025a0, 0xc0003c6630, 0x29, 0x0, 0xc0003c664f, 0x6)
	/gocode/src/github.com/dfurmanov/myapp/officetotxt/xls2csv.go:30 +0x30a

My code (sheetIndex = 0)

func XLStoCSV(w io.Writer, excelFileName string, sheetIndex int) error {
	workbook, err := xls.OpenFile(excelFileName)
	if err != nil {
		return err
	}

	sheet, err := workbook.GetSheet(sheetIndex)
	if err != nil {
		return err
	}

	cw := csv.NewWriter(w)

	for i := 0; i <= sheet.GetNumberRows(); i++ {
		if row, err := sheet.GetRow(i); err == nil {
			cols := row.GetCols()
			values := make([]string, len(cols))
			for i, cell := range cols {
				xfIndex := cell.GetXFIndex()
				formatIndex := workbook.GetXFbyIndex(xfIndex)
				format := workbook.GetFormatByIndex(formatIndex.GetFormatIndex())
				values[i] = format.GetFormatString(cell)
			}
			trimmedValues := TrimLatterEmptyValues(values)
			if len(trimmedValues) > 0 {
				err = cw.Write(trimmedValues)
				if err != nil {
					return err
				}
			}
		}
	}

	cw.Flush()
	return cw.Error()
}

Input file error.xls.zip

dfurmanov avatar Oct 29 '20 17:10 dfurmanov

Haven't tested it, but I think maybe your for loop should be < instead of <= (because zero based).

dougwinsby avatar Oct 29 '20 17:10 dougwinsby

@dougwinsby interesting, i'll check that. However, this is an example given by the author in the README

dfurmanov avatar Oct 29 '20 18:10 dfurmanov

I would try upgrading to 0.9.8. (See https://github.com/shakinm/xlsReader/issues/8)

dougwinsby avatar Oct 29 '20 19:10 dougwinsby

@dougwinsby I have tried both -- using < instead of <= and upgrading to v0.9.8. The same error persists

github.com/shakinm/xlsReader/xls/record.(*LabelSSt).GetString(0xc00046d0e0, 0xc5004c, 0x10)
	/opt/pr/gocode/pkg/mod/github.com/shakinm/[email protected]/xls/record/labelSst.go:43 +0x70
github.com/shakinm/xlsReader/xls/record.(*Format).GetFormatString(0xc0000f08b8, 0x1099ba0, 0xc00046d0e0, 0x1733f00, 0x1)
	/opt/pr/gocode/pkg/mod/github.com/shakinm/[email protected]/xls/record/format.go:125 +0xa85
github.com/dfurmanov/myapp/officetotxt.XLStoCSV(0x1082740, 0xc00077c330, 0xc00038d560, 0x29, 0x0, 0xc00038d57f, 0x6)

dfurmanov avatar Oct 29 '20 20:10 dfurmanov

@dfurmanov thanks for finding this bug! I tested the application with your Xls file and found that the SST records are not reading correctly, but if you re-save this file and restart the test, everything is fine. It will take me some time to fix this error. Because I don't have much free time right now, but I will try to do it as quickly as possible.

shakinm avatar Nov 03 '20 16:11 shakinm

@shakinm unfortunately I have no control over the files I am processing with this library so hopefully the fix will be out soon. Thank you very much for working on this!

dfurmanov avatar Nov 03 '20 22:11 dfurmanov

I've been looking into this issue for the past day and I found a couple of things. Firstly, there is a bug on this line: https://github.com/shakinm/xlsReader/blob/cb2bf4031fc7b9d539e3d07ab15219ff240630d7/xls/record/sst.go#L83

The header size here is hardocoded to be 3, when the size can actually vary depending on the flags set. When I fix it, @dfurmanov's file opens correctly.

However, this did not fix all of my problems because for some documents it fails to read strings broken up by a CONTINUE record. Microsoft docs claim that CONTINUE record has to contain flags field as its first byte, however I found some documents where flags byte is missing if the CONTINUE record begins in the formatting run. I checked openoffice xls documentation and indeed it does mention this quirk(section 5.21):

Formatting runs (➜2.5.1) cannot be split between their components (character index and FONT record index). If a string is split between two formatting runs, the option flags field will not be repeated in the CONTINUE record.

Seems to be poor documentation on Microsoft's part. When I fixed this issue, I no longer had any problems with reading SST records.

Going to do a PR.

kleeon avatar Oct 13 '22 17:10 kleeon