excel-streaming-reader icon indicating copy to clipboard operation
excel-streaming-reader copied to clipboard

Blank cells are ignored by parser

Open margintop15px opened this issue 2 years ago • 3 comments

Hello PJ. Thanks for pointing me to your fork I'm trying to parse the xlsx file which have some blank cells. My initial thought was that they have to be returned as null or empty strings but seems they are completely ignored by parser I'm using this library from Clojure but it shouldn't be an issue in theory

Here's my code

(ns xsls-test.core
  (:require
   [clojure.java.io :as io])
  (:import
   [com.github.pjfanning.xlsx StreamingReader]
   [org.apache.poi.ss.usermodel Cell CellType]))


(defn cell-data [^Cell cell]
  (let [cell-type (.getCellType cell)]
    (cond
      (= cell-type CellType/NUMERIC) (.getNumericCellValue cell)
      (= cell-type CellType/BOOLEAN) (.getBooleanCellValue cell)
      :otherwise (.getStringCellValue cell))))


(let [stream   (io/input-stream (io/resource "trialling_sergey_change.xlsx"))
      workbook (-> (StreamingReader/builder)
                   (.rowCacheSize 100)
                   (.bufferSize 4096)
                   (.open stream))
      sheet    (first workbook)]
  (->> (seq sheet)
       (mapv (fn [row]
               (let [cells (seq row)]
                 (mapv cell-data cells))))))

And it gives my such result

[["person" "role'" "depart-ment"]
 ["harbs" "head" "devOps" "kkmjksnkja"]
 ["ridders" "product"]
 ["pascal" "head" "afsaaa" "product"]
 ["Dos" "legend" "engineering"]]

Actual file looks like

Снимок экрана 2023-10-25 в 11 44 39

I'm using the latest version of the library [com.github.pjfanning/excel-streaming-reader "4.2.0"]

Is there a way of parsing empty cells as null's?

margintop15px avatar Oct 25 '23 09:10 margintop15px

The code streams through the file and reports what it sees. The xlsx format files usually only add rows and cells to the XML stored in the file if it needs to.

Surely, you can just look at the cell address to work out if cells are skipped?

pjfanning avatar Oct 25 '23 09:10 pjfanning

I don't want to change the iterators and lists to contain nulls. I program mainly in Scala and have a serious aversion to nulls.

There is this API on Row. https://www.javadoc.io/static/org.apache.poi/poi/5.2.4/org/apache/poi/ss/usermodel/Row.html#getCell-int-org.apache.poi.ss.usermodel.Row.MissingCellPolicy-

Have a look at https://www.javadoc.io/static/org.apache.poi/poi/5.2.4/org/apache/poi/ss/usermodel/Row.MissingCellPolicy.html

One policy allows nulls.

If you don't want null rows (when the rows are empty), see https://www.javadoc.io/static/org.apache.poi/poi/5.2.4/org/apache/poi/ss/usermodel/Sheet.html#getRow-int-

pjfanning avatar Oct 25 '23 10:10 pjfanning

Thanks PJ I'll try to implement the workaround based on column indexes

margintop15px avatar Oct 25 '23 11:10 margintop15px