pygmt icon indicating copy to clipboard operation
pygmt copied to clipboard

Track `pygmt/tests/data` using dvc?

Open maxrjones opened this issue 2 years ago • 2 comments

The grdtrack tests modified in https://github.com/GenericMappingTools/pygmt/pull/1762 require a new .csv file for input. Rather than adding more files to the GMT cache or generating a .csv file within those tests, I am wondering if we can start tracking the folder pygmt/tests/data using dvc? In this case, we could easily add a new file track.txt that contains the necessary points without including it in the git history. This path forward may require some modifications to the dvc-diff workflow and adding these files to the release assets (xref https://github.com/GenericMappingTools/pygmt/pull/1317).

maxrjones avatar Mar 01 '22 16:03 maxrjones

I had a similar idea at https://github.com/GenericMappingTools/pygmt/pull/1695#discussion_r814250521 with the RidgeTest.shp file to store it using dvc in pygmt/tests/data. In that case, some of the files were binary and not plain-text (shp, shx, dbf), but considering that RidgeTest.shp could be used by GMT/GMT.jl, I think it made sense to have it in https://github.com/GenericMappingTools/gmtserver-admin cache.

For this track.txt file, since it is plain-text, I think we can store it in git history as long as it isn't too long (maybe 10-20 lines). But I think it's worth discussing whether we should look into storing certain binary files using dvc in pygmt/tests/data such as GeoTIFFs in the future, which would require modifications to the dvc workflow as you said, or just have it always on the GMT-wide cache.

weiji14 avatar Mar 01 '22 17:03 weiji14

Sounds good, thanks for the input. I will track that file with git and will leave this issue open for now.

maxrjones avatar Mar 01 '22 22:03 maxrjones