excel-streaming-reader
excel-streaming-reader copied to clipboard
Blank cells are ignored by parser
Hello PJ. Thanks for pointing me to your fork I'm trying to parse the xlsx file which have some blank cells. My initial thought was that they have to be returned as null or empty strings but seems they are completely ignored by parser I'm using this library from Clojure but it shouldn't be an issue in theory
Here's my code
(ns xsls-test.core
(:require
[clojure.java.io :as io])
(:import
[com.github.pjfanning.xlsx StreamingReader]
[org.apache.poi.ss.usermodel Cell CellType]))
(defn cell-data [^Cell cell]
(let [cell-type (.getCellType cell)]
(cond
(= cell-type CellType/NUMERIC) (.getNumericCellValue cell)
(= cell-type CellType/BOOLEAN) (.getBooleanCellValue cell)
:otherwise (.getStringCellValue cell))))
(let [stream (io/input-stream (io/resource "trialling_sergey_change.xlsx"))
workbook (-> (StreamingReader/builder)
(.rowCacheSize 100)
(.bufferSize 4096)
(.open stream))
sheet (first workbook)]
(->> (seq sheet)
(mapv (fn [row]
(let [cells (seq row)]
(mapv cell-data cells))))))
And it gives my such result
[["person" "role'" "depart-ment"]
["harbs" "head" "devOps" "kkmjksnkja"]
["ridders" "product"]
["pascal" "head" "afsaaa" "product"]
["Dos" "legend" "engineering"]]
Actual file looks like
I'm using the latest version of the library [com.github.pjfanning/excel-streaming-reader "4.2.0"]
Is there a way of parsing empty cells as null's?
The code streams through the file and reports what it sees. The xlsx format files usually only add rows and cells to the XML stored in the file if it needs to.
Surely, you can just look at the cell address to work out if cells are skipped?
I don't want to change the iterators and lists to contain nulls. I program mainly in Scala and have a serious aversion to nulls.
There is this API on Row. https://www.javadoc.io/static/org.apache.poi/poi/5.2.4/org/apache/poi/ss/usermodel/Row.html#getCell-int-org.apache.poi.ss.usermodel.Row.MissingCellPolicy-
Have a look at https://www.javadoc.io/static/org.apache.poi/poi/5.2.4/org/apache/poi/ss/usermodel/Row.MissingCellPolicy.html
One policy allows nulls.
If you don't want null rows (when the rows are empty), see https://www.javadoc.io/static/org.apache.poi/poi/5.2.4/org/apache/poi/ss/usermodel/Sheet.html#getRow-int-
Thanks PJ I'll try to implement the workaround based on column indexes