lisp-xl icon indicating copy to clipboard operation
lisp-xl copied to clipboard

Empty Unique strings

Open lisp-is-the-future opened this issue 7 years ago • 10 comments

Dear Flavio,

searching for a way to read-in .xlsx files using common lisp, since I really want to try to use it in every day life (I am a Bioinformatics Postdoc in Germany), I encountered some problems.

I want to read-in the bigger file which was created using R's openxlsx package. And as a test, I created a dummy small file, which contains actually only the table

A | B | C 1 | 2 | 3 4 | 5 | 6

Reading my xlsx files (test an desired file) in lisp-xl both resulted in empty strings. The :xlsx package could handle the small test, but not the big, desired, file. I could also send you the xlsx files if required.

;; I tried the following using ;; SBCL 1.3.1.debian ;; in emacs/SLIME:

;; I created the small file using libreoffice ;; LibreOffice 5.1.6.2 10m0(Build:2) ;; in ubuntu 16.04

;; and the big file using R openxlsx package (the current one) ;; in ubuntu 16.04

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;

;;;;;;;;;;;;;;;;;;;;;;;; ;; file paths ;;;;;;;;;;;;;;;;;;;;;;;;

(defparameter small-fpath #P"/home/josephus/test/xlsx-files/test.xlsx") (defparameter big-fpath #P"/home/josephus/test/xlsx-files/nrm-counts-avg--k2-vs-wtn.xlsx")

;;;;;;;;;;;;;;;;;;;;;;;; ;; try the small file ;;;;;;;;;;;;;;;;;;;;;;;;

(ql:quickload :xlsx) (xlsx:list-sheets small-fpath)

;; => ((1 "Sheet1" "worksheets/sheet1.xml"))

(xlsx:read-sheet small-fpath 1)

;; => (((:A . 1) . "A") ((:B . 1) . "B") ((:C . 1) . "C") ((:A . 2) . 1) ;; ((:B . 2) . 2) ((:C . 2) . 3) ((:A . 3) . 4) ((:B . 3) . 5) ((:C . 3) . 6))

(xlsx:as-matrix (xlsx:read-sheet small-fpath 1))

;; => #2A(("A" "B" "C") (1 2 3) (4 5 6)), (:A :B :C), (1 2 3)

;; the quicklisp package is that from Carlos Ungil ;; add from Akihide Nano's site: (defun as-alist (xlsx) "Creates an a-list from a list of cells of the form ((:A1 . 42) (:B1 . 21))" (mapcar #'(lambda (lst) (cons (intern (concatenate 'string (symbol-name (caar lst)) (write-to-string (cdar lst))) :keyword) (cdr lst))) xlsx))

(defun as-plist (xlsx) "Creates a p-list from a list of cells of the form (:A1 42 :B1 21)" (mapcan #'(lambda (lst) (cons (intern (concatenate 'string (symbol-name (caar lst)) (write-to-string (cdar lst))) :keyword) (list (cdr lst)))) xlsx))

(as-alist (xlsx:read-sheet small-fpath 1))

;; => ((:A1 . "A") (:B1 . "B") (:C1 . "C") (:A2 . 1) (:B2 . 2) (:C2 . 3) (:A3 . 4) ;; (:B3 . 5) (:C3 . 6))

(as-plist (xlsx:read-sheet small-fpath 1))

;; => (:A1 "A" :B1 "B" :C1 "C" :A2 1 :B2 2 :C2 3 :A3 4 :B3 5 :C3 6)

;;;;;;;;;;;;;;;;;;;;;;; ;; now try the big file ;;;;;;;;;;;;;;;;;;;;;;;

(xlsx:read-sheet big-fpath 1)

;; Node does not have a single string child: #S(XMLS:NODE ;; :NAME t ;; :NS http://schemas.openxmlformats.org/spreadsheetml/2006/main ;; :ATTRS ((space ;; preserve)) ;; :CHILDREN NIL) ;; [Condition of type SIMPLE-ERROR]

;; Restarts: ;; 0: [RETRY] Retry SLIME interactive evaluation request. ;; 1: [*ABORT] Return to SLIME's top level. ;; 2: [ABORT] abort thread (#<THREAD "worker" RUNNING {1006CA5C33}>)

;; Backtrace: ;; 0: (XMLS:XMLREP-STRING-CHILD #S(XMLS:NODE ;; :NAME "t" ;; :NS "http://schemas.openxmlformats.org/spreadsheetml/2006/main" ;; :ATTRS (("space" "preserve")) :CHILDREN NIL) :ERROR) ;; Locals: ;; IF-UNFOUND = :ERROR ;; TREENODE = #S(XMLS:NODE ;; :NAME "t" ;; :NS "http://schemas.openxmlformats.org/spreadsheetml/2006/main" ;; :ATTRS (("space" "preserve")) :CHILDREN NIL) ;; 1: (XLSX::GET-UNIQUE-STRINGS #S(ZIP:ZIPFILE ;; :STREAM #<SB-SYS:FD-STREAM for ;; "file /home/josephus/test/xlsx-files/nrm-counts-avg--k2-vs-wtn.xlsx" ;; {1003FBF2B3}> ;; :ENTRIES #<HASH-TABLE ;; :TEST EQUAL ;; :COUNT 12 {.. ;; Locals: ;; SB-DEBUG::ARG-0 = #S(ZIP:ZIPFILE ;; :STREAM #<SB-SYS:FD-STREAM for "file /home/josephus/test/xlsx-files/nrm-counts-avg--k2-vs-wtn.xlsx" ;; {1003FBF2B3}> ;; :ENTRIES #<HASH-TABLE ;; :TEST EQUAL ;; :COUNT 12 {10088E0403}>) ;; 2: (XLSX:READ-SHEET #P"/home/josephus/test/xlsx-files/nrm-counts-avg--k2-vs-wtn.xlsx" 1) ;; Locals: ;; FILE = #P"/home/josephus/test/xlsx-files/nrm-counts-avg--k2-vs-wtn.xlsx" ;; SHEET = 1 ;; 3: (SB-INT:SIMPLE-EVAL-IN-LEXENV (XLSX:READ-SHEET BIG-FPATH 1) #<NULL-LEXENV>) ;; Locals: ;; SB-DEBUG::ARG-0 = (XLSX:READ-SHEET BIG-FPATH 1) ;; SB-DEBUG::ARG-1 = #<NULL-LEXENV> ;; 4: (EVAL (XLSX:READ-SHEET BIG-FPATH 1)) ;; Locals: ;; SB-DEBUG::ARG-0 = (XLSX:READ-SHEET BIG-FPATH 1) ;; 5: ((LAMBDA NIL :IN SWANK:INTERACTIVE-EVAL)) ;; [No Locals] ;; --more--

;; so :xlsx package cannot handle the big file

;; so it was not possible to open the big file using the :xlsx package

(ql:quickload :lisp-xl) To load "lisp-xl": Load 1 ASDF system: lisp-xl ; Loading "lisp-xl" ;;; Checking for wide character support... WARNING: Lisp implementation doesn't use UTF-16, but accepts surrogate code points. yes, using code points. .. ;;; Checking for wide character support... WARNING: Lisp implementation doesn't use UTF-16, but accepts surrogate code points. yes, using code points. ;;; Building Closure with CHARACTER RUNES .....

(defparameter t1 (lisp-xl:read-sheet small-fpath 1))

;; Reading file /home/josephus/test/xlsx-files/test.xlsx ;; Loading metadata into RAM...3 unique strings found. ;; Uncompressing to File [/tmp/lisp-xl-tempGHU3ALSW.tmp] ...

(print t1)

;; #S(LISP-XL::SHEET ;; :UNIQUE-STRINGS #(" " " " " ") ;; the unique strings are empty ;; :NUMBER-FORMATS NIL ;; :DATE-FORMATS NIL ;; :FILE-NAME #P"/tmp/lisp-xl-tempGHU3ALSW.tmp" ;; :LAST-STREAM-POSITION NIL)

(defparameter t2 (lisp-xl:read-sheet big-fpath 1))

;; Reading file /home/josephus/test/xlsx-files/nrm-counts-avg--k2-vs-wtn.xlsx ;; Loading metadata into RAM...45814 unique strings found. ;; Uncompressing to File [/tmp/lisp-xl-tempAAURSO2.tmp] ...

(print t2)

;; #S(LISP-XL::SHEET ;; :UNIQUE-STRINGS #(" " " " " " " " " " " " " " " " " " " " " " " " " " " " ;; " " " " " " " " " " " " " " " " " " " " " " " " " " " "

;; [ ... may left out ...]

;; " " " " " " " " " " " " " " " " " " " " " " " " " " " " ;; " " " " " " " " " " " ") ;; :NUMBER-FORMATS NIL ;; :DATE-FORMATS NIL ;; :FILE-NAME #P"/tmp/lisp-xl-tempAAURSO2.tmp" ;; :LAST-STREAM-POSITION NIL)

(lisp-xl-csv:excel-to-csv small-fpath "/home/josephus/test/xlsx-files/test.csv" 1)

;; gives

, , ,, ,,

;; as output in test.csv ;; so strings are really empty

;; so somehow no strings are recognized ... ;; I created the small file using libreoffice ;; LibreOffice 5.1.6.2 10m0(Build:2) ;; in ubuntu 16.04

;; and the big file using R openxlsx package (the current one) ;; in ubuntu 16.04

;; would you have a clue, what the problem could be? ;; the desired big file has 45809 rows and 6 columns.

lisp-is-the-future avatar Sep 03 '18 18:09 lisp-is-the-future

Hi @lisp-is-the-future , do you think you could upload the XLSX file with the problem? or at least a partial version of that file?

I'm sorry not to answer before: i can't remember which of my personal mail account is used by GitHub is for notifying.

defunkydrummer avatar Oct 30 '18 19:10 defunkydrummer

Dear @defunkydrummer,

sorry, haven't checked this email address since a while.

Sure, I send you both files.

On Tue, Oct 30, 2018 at 8:25 PM defunkydrummer [email protected] wrote:

Hi @lisp-is-the-future https://github.com/lisp-is-the-future , do you think you could upload the XLSX file with the problem? or at least a partial version of that file?

I'm sorry not to answer before: i can't remember which of my personal mail account is used by GitHub is for notifying.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/defunkydrummer/lisp-xl/issues/2#issuecomment-434434987, or mute the thread https://github.com/notifications/unsubscribe-auth/AhaEnlIwQCLwxlIICPJYSWyn8njZ_eYZks5uqKebgaJpZM4WX8p- .

lisp-is-the-future avatar Nov 07 '18 11:11 lisp-is-the-future

@lisp-is-the-future can you send me the files again? I'm revisiting this.

If you want you can send to my hotmail account, the address is on the readme.

defunkydrummer avatar Jan 25 '19 21:01 defunkydrummer

@lisp-is-the-future i have commited a new version, btw, with a bug fix.

defunkydrummer avatar Feb 13 '19 20:02 defunkydrummer

@defunkydrummer sorry for long not answering. I tried it again. But this time I have a smilar but slightly different error:


(ql:quickload :lisp-xl)

(defparameter *f* "/home/josephus/Dropbox/amit_scripts/RawCounts.xlsx")

(defparameter *c* (lisp-xl:read-sheet *f* 1))

#|
The value
  #S(LISP-XL::SHEET
     :UNIQUE-STRINGS #(#1=" " #1# #1# #1# #1# #1# #1# #1# #1#
                       #1# #1# #1# #1# #1# #1# #1# #1# #1# #1#
                       #1# #1# #1# #1# #1# #1# #1# #1# #1# #1#
                       ...
                       #1# #1# #1# #1# #1# #1# #1#..
   [Condition of type TYPE-ERROR]

Restarts:
 0: [RETRY] Retry SLIME interactive evaluation request.
 1: [*ABORT] Return to SLIME's top level.
 2: [ABORT] abort thread (#<THREAD "worker" RUNNING {100EBCC2E3}>)

Backtrace:
  0: (LENGTH #S(LISP-XL::SHEET :UNIQUE-STRINGS #(#1=" " #1# #1# #1# #1# #1# ...) :NUMBER-FORMATS NIL :DATE-FORMATS NIL :FILE-NAME #P"/tmp/lisp-xl-tempSVDN4TBO.tmp" :LAST-STREAM-POSITION NIL))
  1: (SB-INT:SIMPLE-EVAL-IN-LEXENV (LENGTH *C*) #<NULL-LEXENV>)
  2: (EVAL (LENGTH *C*))
  3: ((LAMBDA NIL :IN SWANK:INTERACTIVE-EVAL))
  4: (SWANK::CALL-WITH-RETRY-RESTART "Retry SLIME interactive evaluation request." #<CLOSURE (LAMBDA NIL :IN SWANK:INTERACTIVE-EVAL) {100E84E44B}>)
  5: (SWANK::CALL-WITH-BUFFER-SYNTAX NIL #<CLOSURE (LAMBDA NIL :IN SWANK:INTERACTIVE-EVAL) {100E84E42B}>)
  6: (SB-INT:SIMPLE-EVAL-IN-LEXENV (SWANK:INTERACTIVE-EVAL "(length *c*)") #<NULL-LEXENV>)
  7: (EVAL (SWANK:INTERACTIVE-EVAL "(length *c*)"))
  8: (SWANK:EVAL-FOR-EMACS (SWANK:INTERACTIVE-EVAL "(length *c*)") "COMMON-LISP-USER" 205)
  9: ((LAMBDA NIL :IN SWANK::SPAWN-WORKER-THREAD))
 10: (SWANK/SBCL::CALL-WITH-BREAK-HOOK #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK> #<FUNCTION (LAMBDA NIL :IN SWANK::SPAWN-WORKER-THREAD) {2274CD7B}>)
 11: ((FLET SWANK/BACKEND:CALL-WITH-DEBUGGER-HOOK :IN "/home/josephus/.emacs.d/elpa/slime-20180413.1720/swank/sbcl.lisp") #<FUNCTION SWANK:SWANK-DEBUGGER-HOOK> #<FUNCTION (LAMBDA NIL :IN SWANK::SPAWN-WORKE..
 12: (SWANK::CALL-WITH-BINDINGS ((*STANDARD-INPUT* . #1=#<SWANK/GRAY::SLIME-INPUT-STREAM {1008F6B5F3}>) (*STANDARD-OUTPUT* . #2=#<SWANK/GRAY::SLIME-OUTPUT-STREAM {1009053713}>) (*TRACE-OUTPUT* . #2#) (*ERR..
 13: ((LAMBDA NIL :IN SWANK::SPAWN-WORKER-THREAD))
 14: ((FLET SB-UNIX::BODY :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
 15: ((FLET "WITHOUT-INTERRUPTS-BODY-4" :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
 16: ((FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE))
 17: ((FLET "WITHOUT-INTERRUPTS-BODY-1" :IN SB-THREAD::CALL-WITH-MUTEX))
 18: (SB-THREAD::CALL-WITH-MUTEX #<CLOSURE (FLET SB-THREAD::WITH-MUTEX-THUNK :IN SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE) {7FFFE0F86D4B}> #<SB-THREAD:MUTEX "thread result lock" owner: #<SB-THREAD:THR..
 19: (SB-THREAD::INITIAL-THREAD-FUNCTION-TRAMPOLINE #<SB-THREAD:THREAD "worker" RUNNING {100EBCC2E3}> NIL #<CLOSURE (LAMBDA NIL :IN SWANK::SPAWN-WORKER-THREAD) {100EBCC28B}> NIL)
 20: ("foreign function: call_into_lisp")
 21: ("foreign function: new_thread_trampoline")
|#

lisp-is-the-future avatar Aug 13 '19 14:08 lisp-is-the-future

I think carlos ungil used :xmls - and for me it seemed that :xmls is the problem.

gwangjinkim avatar Aug 14 '19 09:08 gwangjinkim

based on :cxml which uses streams (then using klacks inside :cxml) I wrote in the recent days my own package for xlsx and ods file reading.

https://github.com/a1b10/cl-xlsx/blob/master/README.md

On Wed, Aug 14, 2019 at 11:25 AM Gwang-Jin Kim [email protected] wrote:

I think carlos ungil used :xmls - and for me it seemed that :xmls is the problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/defunkydrummer/lisp-xl/issues/2?email_source=notifications&email_token=AILIJHXT72BHE5NTS2TTOTLQEPFSDA5CNFSM4FS7ZJ7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4IHKKQ#issuecomment-521172266, or mute the thread https://github.com/notifications/unsubscribe-auth/AILIJHT2LC3BT5QNRYBQOHDQEPFSDANCNFSM4FS7ZJ7A .

lisp-is-the-future avatar Aug 24 '19 10:08 lisp-is-the-future

I need this for my bioinformatics stuff, because in bioinformatics a lot of xlsx table files are used - because the biologists doing the experiments rely a lot on MS-Office products. I want to use more lisp at workplace.

On Sat, Aug 24, 2019 at 12:16 PM J Kim [email protected] wrote:

based on :cxml which uses streams (then using klacks inside :cxml) I wrote in the recent days my own package for xlsx and ods file reading.

https://github.com/a1b10/cl-xlsx/blob/master/README.md

On Wed, Aug 14, 2019 at 11:25 AM Gwang-Jin Kim [email protected] wrote:

I think carlos ungil used :xmls - and for me it seemed that :xmls is the problem.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/defunkydrummer/lisp-xl/issues/2?email_source=notifications&email_token=AILIJHXT72BHE5NTS2TTOTLQEPFSDA5CNFSM4FS7ZJ7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4IHKKQ#issuecomment-521172266, or mute the thread https://github.com/notifications/unsubscribe-auth/AILIJHT2LC3BT5QNRYBQOHDQEPFSDANCNFSM4FS7ZJ7A .

lisp-is-the-future avatar Aug 24 '19 10:08 lisp-is-the-future

based on :cxml which uses streams (then using klacks inside :cxml) I wrote in the recent days my own package for xlsx and ods file reading. https://github.com/a1b10/cl-xlsx/blob/master/README.md On Wed, Aug 14, 2019 at 11:25 AM Gwang-Jin Kim @.***> wrote: I think carlos ungil used :xmls - and for me it seemed that :xmls is the problem. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#2?email_source=notifications&email_token=AILIJHXT72BHE5NTS2TTOTLQEPFSDA5CNFSM4FS7ZJ7KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD4IHKKQ#issuecomment-521172266>, or mute the thread https://github.com/notifications/unsubscribe-auth/AILIJHT2LC3BT5QNRYBQOHDQEPFSDANCNFSM4FS7ZJ7A .

Hey, thanks for this!! This will benefit the Lisp ecosystem.

I see your lib loads the whole file into RAM which was what i tried to avoid, because we used very big files, often more than 100MB.

Be sure to add your lib to the CLIKI wiki website!

defunkydrummer avatar Oct 30 '20 14:10 defunkydrummer

Hey! Welcome! Yes, at the moment - but current version - doesn't use :cxml - thus no streams - thus loads everything into RAM. There was a bug and @slyrus fixed the bug by rewriting it in that way with some other dependencies (e.g. :FXML). I wanted however re-introduce the stream usage again. Thank you for your hint to add it to CLKI wiki website. I'll do it as soon as I can!

gwangjinkim avatar Nov 02 '20 10:11 gwangjinkim