PyExcelerate icon indicating copy to clipboard operation
PyExcelerate copied to clipboard

Cannot save text that begins with equals

Open austinlau opened this issue 4 years ago • 4 comments

When a string begins with equal sign (=), it will enforce to be a formula. Would it provide an option to change the data type to string?

import pyexcelerate
from pyexcelerate import Style, Format

wb = pyexcelerate.Workbook()
ws = wb.new_sheet("test", data=[["=abcde"]])
ws.set_cell_style(1, 1, Style(format=Format('@')))
wb.save(r"test.xlsx")

austinlau avatar Aug 17 '21 08:08 austinlau

Escape the string by prefixing it with '. We follow Excel's rules here.

ws = wb.new_sheet("test", data=[["'=abcde"]])

kevmo314 avatar Aug 17 '21 14:08 kevmo314

I came across this same issue when I was dealing with emoticon data (since a lot of them start with equal sign(=)).

On MS Excel, escaping the cell value with a preceding single quote(and then hitting enter) makes the cell to be treated as string not formula but the preceding quote itself disappears.

But in pyexcelerate, making =abcde into '=abcde leaves the preceding quote in the value.

I had to override a function of Worksheet class to make my data work as I intended, and it kinda worked.

import six
from typing_extensions import override
from xml.sax.saxutils import escape

from pyexcelerate.DataTypes import DataTypes
from pyexcelerate.Range import Range
from pyexcelerate.Utility import to_unicode
from pyexcelerate.Worksheet import (
    Worksheet as PyExcelerateWorksheet,
    _illegal_xml_chars_RE,
)

class NoFormulaWorksheet(PyExcelerateWorksheet):

    @override
    def _Worksheet__get_cell_data(self, cell, x, y, style):
        if DataTypes.get_type(cell) == DataTypes.FORMULA:
            z = '" t="inlineStr"><is><t xml:space="preserve">%s</t></is></c>' % escape(
                _illegal_xml_chars_RE.sub(u"\uFFFD", to_unicode(cell if isinstance(cell, six.string_types) else str(cell)))
            )
            if style and hasattr(style, "id"):
                return '<c r="%s" s="%d%s' % (
                    Range.coordinate_to_string((x, y)),
                    style.id,
                    z,
                )
            else:
                return '<c r="%s%s' % (Range.coordinate_to_string((x, y)), z)
        else:
            return super()._Worksheet__get_cell_data(cell, x, y, style)

I'm not sure whether pyexcelerate should actually care about this or not.

Can we find a way to provide this kind of behavior with, maybe, optional args of Worksheet.__init__() or something?

TintypeMolly avatar Mar 16 '23 07:03 TintypeMolly

But in pyexcelerate, making =abcde into '=abcde leaves the preceding quote in the value.

This sounds like a bug, preceding the string with a '= should result in a raw = prefix.

More broadly though, maybe we can provide an API to change the data type of a range. Following some of the more recent tickets, it seems that a common use case is bulk importing data, so setting the data type of a column or something is reasonable. We may want to adjust our pandas import scheme (which is somewhat implicit right now) to also pull these data types.

So there are two potential ideas here, one bug and one additional feature.

kevmo314 avatar Mar 16 '23 13:03 kevmo314

Thanks @TintypeMolly for providing your solution.

On second thought, using Format('@') to enforce the cell value as Text is not a solid solution because it cannot distinct the following situations:

  1. a formula where the output format is Text. To perform in Excel, input =1+1 and then change the number format to Text. The cell value will then align to left. Currently pyexcelerate is handling this case.

  2. a = prefix string where the number format is Text. To perform in Excel, change the number format to Text and then input =abcde. pyexcelerate cannot handle this case and I think all users with = prefix string want to handle this case. In addition, if the number format is not Text, when an user edit the cell (double click on the cell and then hitting enter), Excel will auto reformat it as formula which is not preferred.

Adding ' to data value will destroy the data

For example, we are writing an Excel and someone else will read the Excel. We want to put the string =.=, '.' and a formula =1+1 into excel. Obviously we cannot add ' to all strings because it will break the formula. But if we only add ' to the first string, how someone else knows which cells should remove the '.

One option is to add ' to both = and ' prefix strings and the reader always remove the prefix '. However, this custom logic to encode/decode is not preferred.

Option 1: allow user to set the data_type

Similar solution to openpyxl: https://stackoverflow.com/questions/63747180/openpyxl-write-string-beginning-with-equals

I think adding a option in Style to tell pyexcelerate the data_type is a better solution and easy to implement.

Worksheet.py

class Worksheet(object):
    def __get_cell_data(self, cell, x, y, style):

        ......

        if style.data_type:
            type = style.data_type
        else:
            type = DataTypes.get_type(cell)

        if type == DataTypes.NUMBER:
        
        ......

sample code

import pyexcelerate
from pyexcelerate import Style, Format
from pyexcelerate.DataTypes import DataTypes

wb = pyexcelerate.Workbook()
ws = wb.new_sheet("test", data=[["=abcde", "=abcde"]])
ws.set_cell_style(1, 1, Style(format=Format('@'), data_type=DataTypes.INLINE_STRING))
ws.set_col_style(2, Style(format=Format('@'), data_type=DataTypes.INLINE_STRING))
wb.save(r"test.xlsx")

Option 2: use quotePrefix

Actually, Excel use additional format attribute to indicate a single quote is prefixed. So to follow the Excel we need to add an option quotePrefix in Style.

styles.xml


    <cellXfs count="1">
        <xf numFmtId="0" fontId="0" fillId="0" borderId="0" xfId="0" quotePrefix="1" />
    </cellXfs>

We can implement both options at the same time.

austinlau avatar Mar 13 '24 15:03 austinlau

Any plans to release a new version with this fix?

eduardorost avatar Apr 09 '24 18:04 eduardorost

Any plans to release a new version with this fix?

I've published 0.11.0 with this fix.

kevmo314 avatar Apr 15 '24 14:04 kevmo314