sheet2dict Empty cells in xlsx become the non-empty string "None" in the dictionary

Thank you for writing this. I have recently moved from using csv files to excel files and this has made the transition easier.

One issue I have encountered is that empty cells are not being returned as a Falsey value like None, or the empty string ''.

From what I have been able to understand, openpyxl reads the empty cell as None, and the following code in sheet2dict uses str to convert this into the string 'None'.

https://github.com/Pytlicek/sheet2dict/blob/93bf731d327d620755ffd5585aedb0b23a17a6a8/sheet2dict/main.py#L30-L34

I have filtered my own dictionaries to turn 'None' into '' but that will cause issues when any of my sheet really do contain the word None.

I just did some quick testing, and it seems that the csv.DictReader() in the python standard library uses the empty string '' for missing values in the a csv file, so that might be a good thing to do for these cases.

Apr 24 '22 12:04 JoshuaCapel

Good idea @JoshuaCapel 👍 thanks for pointing this out 🙏 I hope I will have some spare time this weekend and I can work on this.

Apr 27 '22 11:04 Pytlicek

@JoshuaCapel thank you for raising this. I dusted off sheet2dict today for a new project and tried the latest version and experienced this. I had a un-merged a lot of cells and was shocked to see them full of "None". It's a real shame that openpxyl doesn't support merged cells... but then again, I have no idea how sheet2dict would handle them.

@Pytlicek - I lost your contact information somehow, please ping me. In the meantime, let's talk about a solution. I need to re-set up my IDE/computer to push/pull correctly.

Maybe we can add an if just to check to see if the result is None. I did not modify the sheet2dict source, I made allowances in my code.

This is untested.
I cannot recall if the current Python convention is to use equality checks or is/is not. Is there a more elegant solution? Is type checking the way to go?
I know this implementation will add CPU cycles... I'm not sure if we should use if/else instead since every cell is checked twice with my implementation.

 def item(row, col):
     if type(sheet.cell(row=row, column=col).value) is None:
        return (sheet.cell(row=1, column=col).value, '')
    if type(sheet.cell(row=row, column=col).value) is not None:
        return (sheet.cell(row=1, column=col).value, str(sheet.cell(row=row, column=col).value))

Also, whatever we do, we should add some test cases to check for this (You taught me well @Pytlicek!)

Oct 20 '22 10:10 bai-yi-bai

@bai-yi-bai do you have an example file? I want to look at this and better understand what is the case, what exactly we need and so on. Thanks

Oct 23 '22 18:10 Pytlicek

@Pytlicek:

I created a PR https://github.com/Pytlicek/sheet2dict/pull/9 (Actually there were 2 PRs, my IDE didn't like the paths hard-coded into the test files.)

inventory.xlsx is sufficient since it has empty cells.
The existing tests were insufficient to detect this bug.
I'm not sure why empty headers are not affected, but this uncovered a new bug: if two header columns are empty (None), then one of the column's data will be ignored/overwritten. We should examine this further; meaning we need to generate a new excel file with two empty columns.
main.py

        def item(row, col):
            if sheet.cell(row=row, column=col).value is None:
                return (sheet.cell(row=1, column=col).value, '')
            else:
                return (sheet.cell(row=1, column=col).value, str(sheet.cell(row=row, column=col).value))

\tests\test_3_xlsx.py

def test_parse_xlsx_sheet_items(worksheet):
      ws = worksheet
      ws.xlsx_to_dict(path="inventory.xlsx", select_sheet='SJ3')
      ws_items = ws.sheet_items
      assert "Taipei" in str(ws_items)
      assert "Bratislava" not in str(ws_items)
      assert "None:" in str(ws_items)
      assert "'None'," not in str(ws_items) # Tests whether empty cells were converted to "" strings and not "None"
      assert None in ws_items[0]
      assert len(ws_items) > 1
      assert len(ws_items) == 6

I inserted a print(ws_items) and compared the output with https://www.diffchecker.com/diff
Compare the difference between the d8d363b and my proposed changes. I tried to use BOLD on 'None' and ''. Theref are four changes.
d8d363b:
test_3_xlsx.py::test_parse_xlsx_sheet_items [{'country': 'Taiwan', 'city': 'Taipei', 'area (km)': '271.8', None: '88', 'rand_field': 'ee'}, {'country': 'South Korea', 'city': 'Seoul', 'area (km)': '605.2', None: '77', 'rand_field': 'ee'}, {'country': 'USA', 'city': 'New York', 'area (km)': '783.8', None: '66', 'rand_field': 'ee'}, {'country': 'None', 'city': 'None', 'area (km)': 'None', None: '55', 'rand_field': 'ee'}, {'country': 'USA', 'city': 'Miami', 'area (km)': '143.1', None: 'None', 'rand_field': 'ee'}, {'country': ' ', 'city': ' ', 'area (km)': ' ', None: ' ', 'rand_field': ' '}]
My change:
test_3_xlsx.py::test_parse_xlsx_sheet_items [{'country': 'Taiwan', 'city': 'Taipei', 'area (km)': '271.8', None: '88', 'rand_field': 'ee'}, {'country': 'South Korea', 'city': 'Seoul', 'area (km)': '605.2', None: '77', 'rand_field': 'ee'}, {'country': 'USA', 'city': 'New York', 'area (km)': '783.8', None: '66', 'rand_field': 'ee'}, {'country': '', 'city': '', 'area (km)': '', None: '55', 'rand_field': 'ee'}, {'country': 'USA', 'city': 'Miami', 'area (km)': '143.1', None: '', 'rand_field': 'ee'}, {'country': ' ', 'city': ' ', 'area (km)': ' ', None: ' ', 'rand_field': ' '}]

Oct 31 '22 05:10 bai-yi-bai

Thank you @bai-yi-bai 🙏 Merged and published on PyPI with version: sheet2dict 0.1.5 @JoshuaCapel pls test if you are still interested in this issue. THX 👍

Oct 31 '22 12:10 Pytlicek

sheet2dict sheet2dict copied to clipboard

Empty cells in xlsx become the non-empty string "None" in the dictionary

sheet2dict
sheet2dict copied to clipboard