trino icon indicating copy to clipboard operation
trino copied to clipboard

Improve packaging

Open mosabua opened this issue 1 year ago • 11 comments

The current binary packages provided by the Trino suffer from a number of issues.

  • RPM packaging is barely used and tested, but creates significant work and has a negative impact on the build time.
  • Tarball and docker container are large and contain all plugins, typical usage however only facilitates a small subset of the plugins
  • The large artifacts result in longer download and installation times, which also affect deployments.
  • Occasionally plugins maybe contain dependencies with reported security issues and therefore Trino might be flagged during security analysis at users (rightly or wrongly so).
  • Custom package creation is not really supported or documented.

These issues will get worse in the near future with more connectors being merged and also more native binaries for multiple operating systems and processor architectures being required and included.

This roadmap items collects a number of related work tasks that we want to engage on. Numerous discussions took place prior to filing this issue on slack, in smaller conversations, and at the Trino Contributor Calls and Congregation.

Following are a number of sections that details tasks and ideas. Work on these can be done in parallel.

Pull out RPM ✅

The RPM is rarely used by now and we agree on removal of it from the core trino repo. Since it is build from the tarball however it is possible to pull the rpm packaging aspects out of the core repo into a separate repository that users can use to build an RPM for any Trino version.

The repository https://github.com/simpligility/trino-packages is a first PoC implementation of this approach. The naming is generic since it can also be used for other package creation in separate modules.

Tasks to implement the removal are:

  • Update and move trino-packages repo into trinodb org
  • Update docs in repo to explain how to create RPM
  • Update Trino website to remove RPM download and just link to docs
  • Update docs to link to repo and explain that you must build the RPM yourself first before using it
  • In the longer run we can maybe even remove the rpm docs from the trino docs completely and just rely on docs in the trino-packages repo

Figure out plugins for different packages ✅

We need to determine what different packages we want to offer and what each package should contain. Following is a first idea

  • Minimal
    • Only contains what is necessary to start Trino and use it
    • So no plugins .. or maybe just memory to allow some initial testing
  • Default
    • Only contains commonly used plugins
    • Definitely remove the following: atop, blackhole, example-http, exchange-filesystem, exchange-hdfs, geospatial, http-event-listener, http-server-event-listener, local-file, ml, mysql-event-listener, openlineage, raptor-legacy, teradata-functions, thrift
    • Probably add all lakehouse so hive, hudi, delta-lake, iceberg
    • Some others .. but which

Also create a docs page in installation for plugins that talks about adding and removing plugins.

Different tarballs - ✅

Once we figure desired plugins and archive variants, we should adjust the build and publishing process to publish these and update docs as well

We also then need to add docs on how to download and add additional plugins.

Different container images - ✅

Once we figure desired plugins and archive variants, we should adjust the build and publishing process to publish these and update docs as well

We also then need to add docs for:

  • How to download and add additional plugins.
  • How to add other packages (not package manager since that is removed...)
  • How to create a docker container with different base OS from scratch maybe (could be example in packaging repo)

Plugin loading

Over time it might be even better to be able to define a URL or similar pointer to a running system and then load that plugin onto the servers and run it. of course security concerns and other aspects need to be figured out. API for these operations could (or maybe even should) be SQL command similar to the dynamic catalog management features.

mosabua avatar Jul 05 '24 21:07 mosabua

cc @nineinchnick

martint avatar Aug 22 '24 09:08 martint

@mosabua,

Just another devex thing we may consider to at least track in an issue, is to build a web app akin to this Spring Boot one where you have a nice interface that hits a service to download a tar file with a custom list of plugins.

We could even make it a rest endpoint on a GitHub action that takes in a list of plugins and trino version and returns the tar file with just those plugins. Then we can just have minimal with in-memory (possibly some local filesystem if we want to replicate the DuckDB experience).

This isn't quite as dynamic as the first one you suggested, but it is a bit more secure and if we make the github app a repository, then a company could fork it into their own build system to stay behind a vpn.

bitsondatadev avatar Sep 11 '24 03:09 bitsondatadev

@bitsondatadev that is totally also something that's nice to have once we've done the preparatory work in terms of creating a minimal package and other things as proposed in this ticket. I think it's probably within scope here

mosabua avatar Sep 11 '24 15:09 mosabua

We cleaned up a few connectors now

Atop #23550 Local file #23551 Raptor #23588

mosabua avatar Oct 25 '24 21:10 mosabua

One thing that came up in @23901 is that some connector and other plugins add global functions to Trino. If we remove such a plugin as part of the packaging clean up .. those functions are no longer available in that packaged version of Trino .. we need to therefore either change implementation of the plugins, ensure they are always there or update docs and explain that they depend on a specific plugin..

mosabua avatar Oct 30 '24 14:10 mosabua

Plugins should not expose global functions going forward. That approach is legacy, since trino supports function namespaces.

martint avatar Oct 30 '24 15:10 martint

Can we document using function namespaces in the developer guide?

nineinchnick avatar Oct 30 '24 15:10 nineinchnick

trino-packages repo now exists and I have access. Will work on PR for RPM package support in it shortly

mosabua avatar Feb 11 '25 20:02 mosabua

#23923 shipped in 470

mosabua avatar Feb 14 '25 19:02 mosabua

RPM removal merged for Trino 471, so the Trino 470 release is the last RPM that is published to Maven Central. Website update will merge with the release.

The repo https://github.com/trinodb/trino-packages contains the setup to build a working RPM.

mosabua avatar Feb 19 '25 15:02 mosabua

Geospatial is basically built-in (including engine support for spatial joins), so I think it should be in the default package.

electrum avatar Mar 04 '25 01:03 electrum