gocsv
gocsv copied to clipboard
Convert xlsx file from some directory to CSV
Cannot convert file from stdin
Would you be able to provide more detail? I'm not entirely sure what you mean.
It is true that right now gocsv
does not accept XLSX files via stdin (https://github.com/aotimme/gocsv/blob/52ab92ecb66c17fd3c4aeec44730277013b4a1e7/README.md?plain=1#L676), although it might be that we could add this due to the library it depends upon having a way to open a XLSX file via the io.Reader
interface: OpenReader
.
But the title of the issue makes me think you might be running into a different issue? I'm not sure.
@npmichel, yes, please share more about the need to read an Excel file from stdin... what process is printing the Excel file to stdout? That seems odd, like wanting/needing to read a Photoshop file from stdin.
I am working on large data over, than 800000 lines xlsx file, I want to convert this file from xlsx to cvs and use this csv file to set to PySpark for data engineering. I have the excel file in my machine where I installed gocsv then when I use gocsv to convert xlsx file to csv, it is not working. Here is the output error: Cannot convert file from stdin . My OS is Centos 7
@npmichel, thank you. Looking at the help for xlsx subcommand we can see that the XLSX file must be specified by name, the FILE argument:
gocsv xlsx [--list-sheets] [--dirname DIRNAME] [--sheet SHEET] FILE
Will you please check your command line and make sure you have one XLSX file at the end? If you're still having trouble, please share the entire command line here.
Thank you for your response.
Now I excute the command by specifing only the xlsx file without options (--list-sheets,--sheet,--dirname),
Here my files's structure
I want to convert Global_V5.xlsx, it content two sheets, each sheet to csv file with custom name.
Here is the command I executed
I got folder with same name as my xlsx file, inside this folder I found my csv files like expected. Thank you. For options maybe it's my side, but they are not working as mentioned in documentation. When I add some option there are not any folder created, but the result is printed in my terminal.
I see. If you know there are exactly two sheets in the XLSX file, then you may want something like this:
gocsv xlsx --sheet 1 Global_V5.xlsx > custom-name-1.csv
gocsv xlsx --sheet 2 Global_V5.xlsx > custom-name-2.csv
If the issue is in the documentation, do you have a suggestion for what might help to make it more clear? I can imagine that it is a bit unexpected that specifying various flags changes the type of output (e.g. --list-sheets
provides a list of sheets, --sheet [SHEET]
extracts a single sheet as a CSV and prints to stdout, and no option ends up extracting all sheets as CSVs in a directory).
Otherwise, let me know if you think we can close the issue!
Okay, I think it was me not understanding the documentation.
In my comprehension when you execute this command gocsv xlsx --sheet 1 Global_V5.xlsx
you don't have to explicitly redirect the output by specifing > output_file.csv
. In my mind that will be automaticaly instead of print on terminal.
Now it is clear for me. Thank you very much for these answers and these guides.
For the improvement of the documentation, I suggest that you do the example of execution of the xlsx file conversion command, specifying each use case for all the options and the expected results, but the documentation in this part which was my need was not very intuitive. This could really help because the tool is really great and very fast compared to what I'm used to using (python tools like xlsx2csv
or csvkit
).
Testimony if you have time, you can read because it's a bit long.
I must even testify that I came across this tool because the others that I had no longer supported the size of my file and the existence of certain characters (it is a document written in french language). And this tool is easy to install even if I am on a VM but once the file is imported the installation is easy and the execution is very fast, without errors and handles all the accents. I will recommend to my team members.
You can close this issue.
@npmichel, I remember struggling a bit with the xlsx subcommand when I first tried it.
I don't think an exhaustive set of examples for all options will do. Still, I've submitted PR #59 to put the behavior surrounding saving sheets to a directory at the beginning of xlsx's documentation.
@npmichel, how does this look?
Convert sheets of an XLSX file to CSVs.
The command defaults to writing all converted sheets to a directory with the same name as the XLSX file (without the ".xlsx" extension).
Usage:
gocsv xlsx [--list-sheets | --dirname DIRNAME | --sheet SHEET] FILE
Arguments:
-
--list-sheets
(optional) List the sheets, by index and name, in the XLSX file. -
--sheet
(optional) Specify a single sheet, by index or name, to convert and write to stdout. -
--dirname
(optional) Specify the name of the directory for the converted sheets. The command defaults to the same name asFILE
, minus the extension.
Only one option can be used; multiple options cannot be combined.