TileDB icon indicating copy to clipboard operation
TileDB copied to clipboard

Single release CMake file that contains all required links/hashes

Open dudoslav opened this issue 1 year ago • 19 comments

This PR aims to autogenerate cmake file containing all required hash/url combinations so that other repositories can simply pull this file and forget about changing all url/hash occurences in their code. The output assets look like this now:

https://github.com/dudoslav/TileDB/releases/tag/t01

Please let me know if the produced CMake file is ok, or should be changed to a different format.


TYPE: IMPROVEMENT DESC: Look above

dudoslav avatar Jan 15 '24 11:01 dudoslav

Love this. Having the information in one place (by release) is very good. But may I raise my hand and ask for a (complementary) non-cmake format? Maybe JSON? Maybe key:value pairs? R does use cmake and while I could shimmy it (with help) a simpler approach would be good too.

eddelbuettel avatar Jan 15 '24 13:01 eddelbuettel

I have no problem also adding a json format, however, do you really need release build outside cmake?

I can read JSON from CMAKE without any issues since I can use string(JSON ...) functions

dudoslav avatar Jan 15 '24 16:01 dudoslav

Yes, good point @eddelbuettel

I have no problem also adding a json format, however, do you really need release build outside cmake?

Yes, these builds are used by TileDB-R outside of CMake.

ihnorton avatar Jan 15 '24 16:01 ihnorton

To be perfectly plain, I would prefer non-JSON and reading JSON happens to not be batteries included. I currently use the 'Debian Control File' format R itself uses in its DESCRIPTION file, for that we have a ready-made parser included (read.dcf() that is):

https://github.com/TileDB-Inc/TileDB-R/blob/e17563a554921d952ac473d5e49b65888a6a035a/tools/tiledbVersion.txt#L1-L2

Of course, other formats work but at the cost of added dependencies. So JSON, YAML, TOML, ... are all equally (in)convenient with the need for other library.

eddelbuettel avatar Jan 15 '24 16:01 eddelbuettel

do you really need release build outside cmake?

Yeah. Let's plan on a quick huddle this week and I show you a quick demo and more resources. In essence, R is driven by e.g. R CMD INSTALL nameOfArchve_x.y.z.tar.gz and invokes just make and other standard tools (via pre-made added Makefile snippets). One can farm out to cmake and other callable tools but it counts as extra dependency. We currently only depend on cmake if a build from (full, ie TileDB) source is requested else R and shell etc lead to the download of an artifact.

eddelbuettel avatar Jan 15 '24 16:01 eddelbuettel

Ok so my next plan is to:

  1. Create CSV file of all releases + source
  2. Change sha1 to sha256 if its ok with everyone (because it might create more problems downstream)
  3. Modify cmake to use this csv file
  4. Divide cmake function into download_prebuilt and download_source

dudoslav avatar Jan 16 '24 09:01 dudoslav

This might be a format that could be compatible with both R and CMake:

(base) dudoslav@beetle:~$ cat rellist.csv
os,arch,url,sha,noavx2
LINUX,X86_64,https://SOMEURL.com,da39a3ee5e6b4b0d3255bfef95601890afd80709,
LINUX,X86_64,https://SOMEURL.com,da39a3ee5e6b4b0d3255bfef95601890afd80709,NOAVX2
WINDOWS,X86_64,https://SOMEURL.com,da39a3ee5e6b4b0d3255bfef95601890afd80709,

dudoslav avatar Jan 16 '24 10:01 dudoslav

DESC: Look above

The item in the DESC field will be included in the release notes. You should set it to something meaningful like Add redistributable CMake file that downloads prebuilt binaries from GitHub Releases.

teo-tsirpanis avatar Jan 16 '24 11:01 teo-tsirpanis

Here's how I am imagining this. Each release will have a JSON (or another format like CSV but I would prefer JSON IMHO) file named say artifacts.json that will contain information about the available release artifacts. Here's an example for 2.19.0:

{
    // We could add this to help non-vcpkg builds from source.
    "git-ref": "fa30a88abc5504ed8387e25937744afcd3802232",
    "artifacts": {
        "x64-windows": {
            "filename": "tiledb-windows-x86_64-2.19.0-fa30a88a.zip",
            "sha512": "…"
        },
        "x64-linux": {
            "filename": "tiledb-linux-x86_64-2.19.0-fa30a88a.zip",
            "sha512": "…"
        },
        "x64-linux-noavx2": {
            "filename": "tiledb-linux-x86_64-noavx2-2.19.0-fa30a88a.zip",
            "sha512": "…"
        },
        "x64-macos": {
            "filename": "tiledb-macos-x86_64-2.19.0-fa30a88a.zip",
            "sha512": "…"
        },
        "arm64-macos": {
            "filename": "tiledb-macos-arm64-2.19.0-fa30a88a.zip",
            "sha512": "…"
        }
    }
}

By reading from the JSON file, the CMake script can be version-agnostic and be checked in once for every downstream repository. Here's how the script could be used:

include(DownloadPrebuiltTileDB)

# Downloads prebuilt binaries of TileDB from GitHub
# Releases and calls find_package to import them.
find_prebuilt_tiledb(
  # The version of TileDB to use. Required.
  VERSION 2.19.0
  # The hash of the artifacts.json file. Optional, will warn if not
  # specified, maybe suggesting to at least enable CMAKE_TLS_VERIFY.
  MANIFEST_SHA512 …
  # The name of the release artifact to download.
  # Optional, will try to autodetect if not specified.
  ARTIFACT_NAME x64-linux-noavx2
  # Flags that will also be passed to the find_package call.
  REQUIRED QUIET
)

target_link_libraries(my_project PRIVATE TileDB::tiledb)

Regarding building from source I am wondering if it would be a better idea to have this CMake script exclusively deal with downloading prebuilt binaries, and direct users that want more customizability to vcpkg (after we create a tiledb port, which can be done at any time). Scoping this out will significantly simplify the script.

teo-tsirpanis avatar Jan 16 '24 12:01 teo-tsirpanis

@teo-tsirpanis I really like your suggestion, but there is a requirement from the R team for a format that could be easily included without dependencies. So JSON is somewhat out of question. Other than that, the CMake script will contain two functions:

fetch_release_list that downloads and parses the release list (rellist.csv) download_source_tiledb that downloads and build source tiledb download_prebuilt_tiledb that downloads prebuilt tiledb

The question is that if we don't need download_source_tiledb then I could get rid of fetch_release_list function. However, are we really sure we don't need that source tiledb distribution?

dudoslav avatar Jan 16 '24 12:01 dudoslav

The R build would need to load another package just to parse JSON, there is no built-in reader. Csv and alike work fine.

Another idea, because we are not short those few bytes, would be to store both sha1 and sha256 checksums.

eddelbuettel avatar Jan 16 '24 13:01 eddelbuettel

Also, I am still not sure where to put this download logic, since it will not change between releases. Only the rellist changes. Maybe if I would hardcode the latest version into it it could be also distributed as release asset otherwise I am not sure what to do with it

dudoslav avatar Jan 16 '24 13:01 dudoslav

Ok, so what do you guys think about the latest changes and mostly about the DownloadPrebuiltTileDB.cmake file? I removed the source distribution logic as requested and add parameters to that CMake script.

Edit 1.: As always the produced CSV can be found here: https://github.com/dudoslav/TileDB/releases/download/t01/rellist.csv

Edit 2: Fixed link

Edit 2: Change link directly to CSV

dudoslav avatar Jan 16 '24 13:01 dudoslav

As always the produced CSV can be found here:

That's a cmake file, not a csv file.

Edit two weeks later: Not sure what I was up to in that comment. Hm.

eddelbuettel avatar Jan 16 '24 14:01 eddelbuettel

Revisiting my comment from two weeks ago: Hm. I must have been confused or in a rush or both. rellist.csv is very much a csv as discussed:

edd@rob:~/Downloads$ cat rellist.csv 
os,arch,url,sha,noavx2
LINUX,X86_64,https://api.github.com/repos/dudoslav/TileDB/releases/136678694/assets/tiledb-linux-x86_64-noavx2-t01-3ecfe85.tar.gz,298c65001b39629974613611572bf9033755431c,NOAVX2
LINUX,X86_64,https://api.github.com/repos/dudoslav/TileDB/releases/136678694/assets/tiledb-linux-x86_64-t01-3ecfe85.tar.gz,90b1a8c25998422a5b8ec1cea12c86d97ff14a07,
MACOS,ARM64,https://api.github.com/repos/dudoslav/TileDB/releases/136678694/assets/tiledb-macos-arm64-t01-3ecfe85.tar.gz,6813c0b469680630c09545359531b8140ede3b61,
MACOS,X86_64,https://api.github.com/repos/dudoslav/TileDB/releases/136678694/assets/tiledb-macos-x86_64-t01-3ecfe85.tar.gz,213fa27cbef480c528e7128928a61f5391d49fc3,
WINDOWS,X86_64,https://api.github.com/repos/dudoslav/TileDB/releases/136678694/assets/tiledb-windows-x86_64-t01-3ecfe85.zip,e6ba34bafa07738ccc0a3da14e106bf97e51a82d,
source,source,https://api.github.com/repos/dudoslav/TileDB/zipball/t01,577c693aa189569dab6ddc51f74a98d7bb459987,source
edd@rob:~/Downloads$ 

eddelbuettel avatar Feb 02 '24 14:02 eddelbuettel

os,arch,url,sha,noavx2

I think a CSV schema of id,url,sha where id can have values such windows-x64, linux-x64-noavx2 or source is more scalable.

And won't we use SHA256 or greater?

teo-tsirpanis avatar Feb 02 '24 14:02 teo-tsirpanis

I like column headers 'os', 'arch', 'url' and 'sha(1)'. We could fold the noavx2 into the arch column making it x86_64 and x86_64_noavx as a specialisation but it is .. arguably easier to have it split in an extra column even of the csv file is 'ragged':

> D <- fread("rellist.csv")    # from data.table which pretty-prints as below 
> D
        os   arch                                                                                                                  url                                      sha noavx2
    <char> <char>                                                                                                               <char>                                   <char> <char>
1:   LINUX X86_64 https://api.github.com/repos/dudoslav/TileDB/releases/136678694/assets/tiledb-linux-x86_64-noavx2-t01-3ecfe85.tar.gz 298c65001b39629974613611572bf9033755431c NOAVX2
2:   LINUX X86_64        https://api.github.com/repos/dudoslav/TileDB/releases/136678694/assets/tiledb-linux-x86_64-t01-3ecfe85.tar.gz 90b1a8c25998422a5b8ec1cea12c86d97ff14a07       
3:   MACOS  ARM64         https://api.github.com/repos/dudoslav/TileDB/releases/136678694/assets/tiledb-macos-arm64-t01-3ecfe85.tar.gz 6813c0b469680630c09545359531b8140ede3b61       
4:   MACOS X86_64        https://api.github.com/repos/dudoslav/TileDB/releases/136678694/assets/tiledb-macos-x86_64-t01-3ecfe85.tar.gz 213fa27cbef480c528e7128928a61f5391d49fc3       
5: WINDOWS X86_64         https://api.github.com/repos/dudoslav/TileDB/releases/136678694/assets/tiledb-windows-x86_64-t01-3ecfe85.zip e6ba34bafa07738ccc0a3da14e106bf97e51a82d       
6:  source source                                                             https://api.github.com/repos/dudoslav/TileDB/zipball/t01 577c693aa189569dab6ddc51f74a98d7bb459987 source
> 

Other columns (sha356, md5, ...) can easily be added.

PS Not sure about the last line though.

eddelbuettel avatar Feb 02 '24 14:02 eddelbuettel

UPDATE: I had to rebase this branch on top of another branch that add CPACK releases, since those ones actually generate sha256 hashes. This means that this PR should be merged after that change: https://github.com/TileDB-Inc/TileDB/pull/4694

Here are proposed release assets: https://github.com/dudoslav/TileDB/releases/tag/t03

dudoslav avatar Feb 05 '24 15:02 dudoslav

I changed download logic to use FetchContent and add ReleaseListGeneration into Release workflow. This is the resulting artifacts:

https://github.com/dudoslav/TileDB/releases/tag/t09

And this is the pipeline that produced it:

https://github.com/dudoslav/TileDB/actions/runs/8110756605

I also downloaded DownloadPrebuiltTileDB.cmake and tested in local dummy project.

dudoslav avatar Mar 01 '24 12:03 dudoslav