gocsv icon indicating copy to clipboard operation
gocsv copied to clipboard

Convert xlsx file from some directory to CSV

Open npmichel opened this issue 11 months ago • 9 comments

Cannot convert file from stdin

npmichel avatar Mar 11 '24 10:03 npmichel

Would you be able to provide more detail? I'm not entirely sure what you mean.

It is true that right now gocsv does not accept XLSX files via stdin (https://github.com/aotimme/gocsv/blob/52ab92ecb66c17fd3c4aeec44730277013b4a1e7/README.md?plain=1#L676), although it might be that we could add this due to the library it depends upon having a way to open a XLSX file via the io.Reader interface: OpenReader.

But the title of the issue makes me think you might be running into a different issue? I'm not sure.

aotimme avatar Mar 14 '24 20:03 aotimme

@npmichel, yes, please share more about the need to read an Excel file from stdin... what process is printing the Excel file to stdout? That seems odd, like wanting/needing to read a Photoshop file from stdin.

zacharysyoung avatar Mar 18 '24 02:03 zacharysyoung

I am working on large data over, than 800000 lines xlsx file, I want to convert this file from xlsx to cvs and use this csv file to set to PySpark for data engineering. I have the excel file in my machine where I installed gocsv then when I use gocsv to convert xlsx file to csv, it is not working. Here is the output error: Cannot convert file from stdin . My OS is Centos 7

npmichel avatar Mar 18 '24 09:03 npmichel

@npmichel, thank you. Looking at the help for xlsx subcommand we can see that the XLSX file must be specified by name, the FILE argument:

gocsv xlsx [--list-sheets] [--dirname DIRNAME] [--sheet SHEET] FILE

Will you please check your command line and make sure you have one XLSX file at the end? If you're still having trouble, please share the entire command line here.

zacharysyoung avatar Mar 18 '24 15:03 zacharysyoung

Thank you for your response. Now I excute the command by specifing only the xlsx file without options (--list-sheets,--sheet,--dirname), Here my files's structure image

I want to convert Global_V5.xlsx, it content two sheets, each sheet to csv file with custom name. Here is the command I executed image

I got folder with same name as my xlsx file, inside this folder I found my csv files like expected. Thank you. For options maybe it's my side, but they are not working as mentioned in documentation. When I add some option there are not any folder created, but the result is printed in my terminal.

npmichel avatar Mar 19 '24 11:03 npmichel

I see. If you know there are exactly two sheets in the XLSX file, then you may want something like this:

gocsv xlsx --sheet 1 Global_V5.xlsx > custom-name-1.csv
gocsv xlsx --sheet 2 Global_V5.xlsx > custom-name-2.csv

If the issue is in the documentation, do you have a suggestion for what might help to make it more clear? I can imagine that it is a bit unexpected that specifying various flags changes the type of output (e.g. --list-sheets provides a list of sheets, --sheet [SHEET] extracts a single sheet as a CSV and prints to stdout, and no option ends up extracting all sheets as CSVs in a directory).

Otherwise, let me know if you think we can close the issue!

aotimme avatar Mar 19 '24 16:03 aotimme

Okay, I think it was me not understanding the documentation. In my comprehension when you execute this command gocsv xlsx --sheet 1 Global_V5.xlsx you don't have to explicitly redirect the output by specifing > output_file.csv. In my mind that will be automaticaly instead of print on terminal. Now it is clear for me. Thank you very much for these answers and these guides. For the improvement of the documentation, I suggest that you do the example of execution of the xlsx file conversion command, specifying each use case for all the options and the expected results, but the documentation in this part which was my need was not very intuitive. This could really help because the tool is really great and very fast compared to what I'm used to using (python tools like xlsx2csv or csvkit). Testimony if you have time, you can read because it's a bit long. I must even testify that I came across this tool because the others that I had no longer supported the size of my file and the existence of certain characters (it is a document written in french language). And this tool is easy to install even if I am on a VM but once the file is imported the installation is easy and the execution is very fast, without errors and handles all the accents. I will recommend to my team members. You can close this issue.

npmichel avatar Mar 20 '24 10:03 npmichel

@npmichel, I remember struggling a bit with the xlsx subcommand when I first tried it.

I don't think an exhaustive set of examples for all options will do. Still, I've submitted PR #59 to put the behavior surrounding saving sheets to a directory at the beginning of xlsx's documentation.

zacharysyoung avatar Mar 23 '24 02:03 zacharysyoung

@npmichel, how does this look?


Convert sheets of an XLSX file to CSVs.

The command defaults to writing all converted sheets to a directory with the same name as the XLSX file (without the ".xlsx" extension).

Usage:

gocsv xlsx [--list-sheets | --dirname DIRNAME | --sheet SHEET] FILE

Arguments:

  • --list-sheets (optional) List the sheets, by index and name, in the XLSX file.
  • --sheet (optional) Specify a single sheet, by index or name, to convert and write to stdout.
  • --dirname (optional) Specify the name of the directory for the converted sheets. The command defaults to the same name as FILE, minus the extension.

Only one option can be used; multiple options cannot be combined.

zacharysyoung avatar Mar 23 '24 23:03 zacharysyoung