gdal icon indicating copy to clipboard operation
gdal copied to clipboard

Add support for CSVZ extension in CSV driver

Open phidrho opened this issue 7 months ago • 4 comments

Feature description

Hi,

documentation for CSV driver says that GeoCSV specification is supported since GDAL 2.1.

Everything works as expected for reading and writing using ".zip" extension, but implementation of "special" extension for zipped CSV files - CSVZ is missing.

From GeoCSV_file_format_specification:

Optional auxiliary files (with same base filename but different file extensions) are:

CSVT:
    Contains field type information (schema).
    File extension is .CSVT (or .csvt).
    See section below.
PRJ (to be clarified!):
    Contains Coordinate Reference System ([CRS](https://www.giswiki.ch/CRS)) information.
    File extension is .PRJ (or .prj).
    Default is EPSG:4326 (WGS84, lon/lat).
CSVZ:
    File extension is .CSVZ (or .csvz)
    The CSV file can be accompanied with following files, having the same file base name: .csvt and .prj.
    Archiving and compressing in format .ZIP (or .zip) is also possible and encouraged.

We have a similar situation with SHP and SHZ - both of them are working without need of calling /vsizip/, see video showing examples:

https://github.com/user-attachments/assets/7f4bf00b-d0bb-448c-9052-2c7967cf81e2

Examples:

# CREATES ZIPPED output.shp.zip FILE:
ogr2ogr -f "ESRI Shapefile" output.shp.zip input.gpkg
# CREATES ZIPPED output.shz FILE:
ogr2ogr -f "ESRI Shapefile" output.shz input.gpkg
# CREATES output.zip FOLDER with files inside:
ogr2ogr -f CSV output.zip input.gpkg -lco CREATE_CSVT=YES -lco SEPARATOR=TAB -lco GEOMETRY=AS_WKT
# CREATES output.csvz FOLDER with files inside:
ogr2ogr -f CSV output.csvz input.gpkg -lco CREATE_CSVT=YES -lco SEPARATOR=TAB -lco GEOMETRY=AS_WKT
# CREATES ZIPPED output_vsi.zip FILE with layer name from GPKG:
ogr2ogr -f CSV /vsizip/output_vsi.zip input.gpkg -lco CREATE_CSVT=YES -lco SEPARATOR=TAB -lco GEOMETRY=AS_WKT
# CREATES ZIPPED output_vsi_custom.zip FILE with custom_layer_name:
ogr2ogr -f CSV /vsizip/output_vsi_custom.zip/custom_layer_name.csv input.gpkg -lco CREATE_CSVT=YES -lco SEPARATOR=TAB -lco GEOMETRY=AS_WKT
# THROWS AN ERROR:
ogr2ogr -f CSV /vsizip/output_vsi.csvz input.gpkg -lco CREATE_CSVT=YES -lco SEPARATOR=TAB -lco GEOMETRY=AS_WKT

Additional context

There is a workaround using /vsizip/output.zip syntax but it works only for ZIP extension. It would but be nice to see implementation similar to SHP driver that automatically recognizes extension(s): ZIP and CSVZ while writing, and support for CSVZ while reading.

phidrho avatar May 08 '25 15:05 phidrho

Hi, I saw this feature request and I think it would be useful, I’m willing to contribute to developing this feature.

Before I start working on it, I’d like to confirm with the maintainer team @rouault that this feature would indeed benefit the gdal user community and that it aligns with the overall purpose of gdal.

Thanks!

desertstsung avatar Sep 09 '25 02:09 desertstsung

@sfkeller It is not clear to me if a .csvz file is supposed to be a GZIP file or a ZIP file, or maybe both ? (and in the later case, the side car files .prj and .csv would be embedded into it ?)

rouault avatar Sep 09 '25 10:09 rouault

Hi @rouault Nice to hear from you! Let me dig out my old notes and think about it.

P.S. I actually just recently thought about enhancing CSV with an accompanying schema file, like defined in the CSV Validator tool (but that deserves another issue).

sfkeller avatar Sep 12 '25 18:09 sfkeller

Ok. The idea was to have a single ZIP file that contains all the sidecar files embedded within it. So, I really meant 'ZIP', which is better suited to collections of files than GZIP. QGIS, for example, also uses ZIP when working with compressed Sha*files.

However, if you also want to support GZIP, I wouldn't object (wouldn't you then need to create a TAR file first?).

sfkeller avatar Sep 12 '25 18:09 sfkeller