excelize icon indicating copy to clipboard operation
excelize copied to clipboard

Allow changing built in number formats

Open jje42 opened this issue 2 years ago • 1 comments

Using excelize v2.5.0

excelize is a fantastic library, thank for putting the effort in and making it available.

For all of the excel workbooks I work with, it seems that excel uses the in built number format 14 for dates by default. Excel displays these in localised short date format. In excelize, this has been hard coded as mm-dd-yy. While this is correct for en_US, it is not for other locales.

I know that I can extract the raw value from cells and convert them myself, but defaulting to mm-dd-yy, which is never correct for me, makes functions like GetRows() practically useless for me. (Incidentally, GetCellType() always returns CellTypeUnset for date formatted cells, which I know is technically correct as excel doesn't seem to set the t attribute for these cells, but is not helpful in these cases).

Would you consider adding a function such as:

func (f *File) SetInBuiltNumFmt(num int, value string) {
        builtInNumFmt[num] = value
}

so that, I and others could simple add f.SetInBuiltNumFmt(14, "dd/mm/yyyy") or whatever is needed to get correct localisation? It would obviously be nice is this was automated some how, but this is the simplest solution I could come up with to resolve my issue.

Thanks for considering.

jje42 avatar Apr 07 '22 05:04 jje42

Thanks for your advice. Since current, the library does not support parsing and formatting with all user-defined custom number format expressions, so I think this feature should be added after the number format parser implementation is completed.

xuri avatar Apr 15 '22 08:04 xuri

Thanks for your advice. Since current, the library does not support parsing and formatting with all user-defined custom number format expressions, so I think this feature should be added after the number format parser implementation is completed.

Is it time to add this feature? I can impl this feature

fudali113 avatar Apr 18 '23 12:04 fudali113

Contributions are welcome! Because the number format evaluator still working in progress, which doesn't support converting all number format expressions, you can't get formatted results even changing the build-in number format table. I don't suggest adding a new function to change the built-in number format table. I have made a design draft for this.

  • Currently, I have created a number format parser named NFP for parsing any number format expression, it can be used for parsing build-in number format and custom number format, and convert it to the abstract syntax token.
  • Next, we need to read these tokens to implement each part of the number format expression: positive part, negative part, zero part, text part, color part, etc, this evaluator converts it to the final formatted text. This has been working in progress in the source code numfmt.go,
  • After this evaluator has been implemented, add localization and time/date format (long time, sort time, etc,.) options in the Options data type, user can specify these settings when opening or reading the workbook
  • Since the build-in number format was related to the operating system localization, and time format settings in the spreadsheet applications, we need to map the build-in number format index and number format expression with options in the library. After that, the user can read the formatted cell value by specified localization and date time formats.

xuri avatar Apr 19 '23 02:04 xuri

i think the NLP lib and map the build-in number format index and number format is two things;

  1. the NLP lib support more format features;
  2. numFmtId map support user can customize mapping relationships based on requirements (Because different software customization may be different )

So I was wondering if we could support mapping first?

such as:

add a customize function to openFIle Options;

// NumFmtCodeGetter user customize numFmtId To numFmtCode
// @param fileInfo file info to this middleware
// @param numFmtId current cell numFmtId
// @return numFmtCode map res
// @return match Match or not,if not match,Use the original library logic for processing
type NumFmtCodeGetter func(fileInfo *File, numFmtId int) (numFmtCode string, match bool)

type Options struct {
	MaxCalcIterations uint
	Password          string
	RawCellValue      bool
	UnzipSizeLimit    int64
	UnzipXMLSizeLimit int64
        // user customize numFmtId To numFmtCode function
        NumFmtCodeGetter NumFmtCodeGetter
}

Is it feasible? @xuri

fudali113 avatar Apr 19 '23 03:04 fudali113

and in https://github.com/qax-os/excelize/blob/fb6ce60bd56f4ef80d9fda76dc8accb4bfdee4ff/cell.go#L1327

The changes are roughly as follows:

func (f *File) formattedValue(s int, v string, raw bool) (string, error) {
	if raw {
		return v, nil
	}
	if s == 0 {
		return v, nil
	}
	styleSheet, err := f.stylesReader()
	if err != nil {
		return v, err
	}
	if styleSheet.CellXfs == nil {
		return v, err
	}
	if s >= len(styleSheet.CellXfs.Xf) || s < 0 {
		return v, err
	}
	var numFmtID int
	if styleSheet.CellXfs.Xf[s].NumFmtID != nil {
		numFmtID = *styleSheet.CellXfs.Xf[s].NumFmtID
	}
	date1904 := false
	wb, err := f.workbookReader()
	if err != nil {
		return v, err
	}
	if wb != nil && wb.WorkbookPr != nil {
		date1904 = wb.WorkbookPr.Date1904
	}

        // ===================    change start      =========================

        if file.Options.NumFmtCodeGetter != nil {
              numFmtCode, ok := file.Options.NumFmtCodeGetter(file, numFmtID)
              if ok {
                     return format(v, numFmtCode, date1904), err
              }
        }

        // =====================       change end        =============================

	if ok := builtInNumFmtFunc[numFmtID]; ok != nil {
		return ok(v, builtInNumFmt[numFmtID], date1904), err
	}
	if styleSheet.NumFmts == nil {
		return v, err
	}
	for _, xlsxFmt := range styleSheet.NumFmts.NumFmt {
		if xlsxFmt.NumFmtID == numFmtID {
			return format(v, xlsxFmt.FormatCode, date1904), err
		}
	}
	return v, err
}

fudali113 avatar Apr 19 '23 03:04 fudali113

Because the number format evaluator was required for both build-in number format and custom number format expressions, which read the tokens returns from NFP (Number Format Parser) and apply the formats for the cell value. The work of implements for it was the premise for applying formats for any number format expression. If we allow the user to change the built-in number format table, but there is no evaluator for the user-defined built-in number format, the user still can't get the correct formatted cell value. In addition, I think letting's users know the built-in number format was not a good idea, mapping that by given localization and date time options, just like a spreadsheet application will be a better design.

xuri avatar Apr 19 '23 09:04 xuri

So how do you make that distinction? Is there any documentation I can refer to?

i don‘t find system info or lang in excel xml files;and i find https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.numberingformat?view=openxml-2.8.1 document description diff area but not find diff system ;

For example, numFmtId = 14 value has diff

fudali113 avatar Apr 19 '23 11:04 fudali113

The localization code used in the numfmt.go source code could be found in the LCID reference manual, and we need to create each local and date time's number format mapping manually.

xuri avatar Apr 20 '23 05:04 xuri

like styles.go langNumFmt ?

image

So is it now possible to override the default builtInNumFmt using langNumFmt in your formattedValue?

In that case? How to deal with numFmtId = 14?

fudali113 avatar Apr 20 '23 07:04 fudali113

I've implemented another locale-based design; https://github.com/qax-os/excelize/pull/1525

@xuri PTAL

fudali113 avatar Apr 20 '23 09:04 fudali113

refs:

https://learn.microsoft.com/en-us/openspecs/office_standards/ms-oi29500/17d11129-219b-4e2c-88db-45844d21e528 https://learn.microsoft.com/en-us/dotnet/api/documentformat.openxml.spreadsheet.numberingformat?view=openxml-2.8.1

fudali113 avatar Apr 20 '23 10:04 fudali113

I used the following code to generate constants and all the LCID information

package main

import (
    "encoding/xml"
    "fmt"
    "os"
    "strings"
)

type Table struct {
    Trs []Tr `xml:"tr"`
}

type Tr struct {
    Tds []Td `xml:"td"`
}

type Td struct {
    V string `xml:"p"`
}

func main() {
    file, err := os.Open("all_lcid.html")
    if err != nil {
        panic(err)
    }
    table := Table{}
    decoder := xml.NewDecoder(file)
    err = decoder.Decode(&table)
    if err != nil {
        panic(err)
    }
    fmt.Println(table)
    newFile, err := os.Create("locales.go")
    if err != nil {
        panic(err)
    }
    newFile.Write([]byte(`
package main

type LCID struct {
    Language         string
    Location         string
    LanguageID       int32
    LanguageTag      Locale
    SupportedVersion string
}


`))

    localeConstMap := map[string]string{}
    newFile.Write([]byte(`type Locale string

`))
    newFile.Write([]byte(`const (
`))
    for _, tr := range table.Trs {
        locale := tr.Tds[3].V
        localeConst := toLocalConstName(locale)
        existLocale, ok := localeConstMap[locale]
        if ok {
            panic(fmt.Sprintf("%s == %s", existLocale, locale))
        }
        localeConstMap[locale] = localeConst
        newFile.Write([]byte(fmt.Sprintf("%s Locale = `%s` \n", localeConst, locale)))
    }
    newFile.Write([]byte(`)

`))

    newFile.Write([]byte("var AllLCIDList = []LCID{"))
    for _, tr := range table.Trs {
        s := fmt.Sprintf(
            `
{
    %s: %s,
    %s: %s,
    %s: %s,
    %s: %s,
    %s: %s,
},`,
            "Language", "`"+tr.Tds[0].V+"`",
            "Location", "`"+tr.Tds[1].V+"`",
            "LanguageID", tr.Tds[2].V,
            "LanguageTag", localeConstMap[tr.Tds[3].V],
            "SupportedVersion", "`"+tr.Tds[4].V+"`",
        )
        newFile.Write([]byte(s))
    }
    newFile.Write([]byte(`
}`))
    newFile.Close()
}

func toLocalConstName(s string) string {
    s = strings.ToUpper(s)
    s = strings.ReplaceAll(s, "-", "_")
    return s
}

image image

Is that OK?

fudali113 avatar Apr 21 '23 07:04 fudali113

the all_clid.html file is tbody from https://learn.microsoft.com/en-us/openspecs/windows_protocols/ms-lcid/a9eac961-e77d-41a6-90a5-ce1a8b0cdb9c

image

fudali113 avatar Apr 21 '23 07:04 fudali113

And I still think the name for the language tag is locale is good

Reference to go-playground/locales is also named this way https://github.com/go-playground/locales/blob/ce315c8672599942003599943a1e64288f55b03f/af/af.go#L13

fudali113 avatar Apr 21 '23 07:04 fudali113

Since the commit 49234fb, the library support to specified system date and time format code in the options when creating or opening the workbook. The built-in number format 14 and some date formats that begin with an asterisk (*) in the spreadsheet application's number format setting window were effect by system short date, long date, and long time settings. Now we can set these settings by ShortDatePattern, LongDatePattern, and LongTimePattern fields in the Options. For example, now you can open a workbook with custom date and time options like this:

f, err := excelize.OpenFile("Book1.xlsx", excelize.Options{
    ShortDatePattern: "dd/mm/yyyy",
})

After that, if you get a cell value with built-in number format ID 14, the number will be formatted with dd/mm/yyyy. This feature will be released in the next version.

xuri avatar May 11 '23 01:05 xuri

thank you! :clap:

jje42 avatar May 11 '23 06:05 jje42