trino icon indicating copy to clipboard operation
trino copied to clipboard

Trino-LanceDB plugin

Open walterddr opened this issue 1 year ago • 21 comments

Description

Initial version of Trino-LanceDB plugin

walterddr avatar May 09 '24 01:05 walterddr

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar May 09 '24 01:05 cla-bot[bot]

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar May 12 '24 16:05 cla-bot[bot]

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar May 21 '24 23:05 cla-bot[bot]

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar May 22 '24 17:05 cla-bot[bot]

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar May 22 '24 18:05 cla-bot[bot]

We should add documentation and the logo as well and maybe take this out of draft mode for first feedback and testing.

mosabua avatar May 27 '24 22:05 mosabua

We should add documentation and the logo as well and maybe take this out of draft mode for first feedback and testing.

yes. will do once we get a green CI, (also with documentation + logo)

walterddr avatar May 29 '24 03:05 walterddr

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to [email protected]. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

cla-bot[bot] avatar May 30 '24 22:05 cla-bot[bot]

We should add documentation and the logo as well and maybe take this out of draft mode for first feedback and testing.

yes. will do once we get a green CI, (also with documentation + logo)

upgraded to a test release version. PTAL @mosabua. i think lance is still working on a x-platform release so we might still bump the version later. but this should work for linux x86 platform also i think i signed and email the CLA but not sure why CI bot is still saying otherwise. please kindly check

walterddr avatar May 30 '24 22:05 walterddr

CLA processing is a bit behind .. no worries. We can still proceed with reviewing and such

mosabua avatar May 30 '24 23:05 mosabua

@mosabua Hi: Curious what is ETA of this? Thanks for adding this...we were looking for this

bazooka720 avatar Jun 12 '24 00:06 bazooka720

@bazooka720 .. no ETA .. we have to start reviewing more and also figure out how to package this since it add considerable size to the artifacts, but things are in progress.

mosabua avatar Jun 25 '24 16:06 mosabua

sorry for the late follow up. I've chatted with folks at Lance and consider that the follow up PR is already in progress for predicate/project/filter pushdown, we would like to move quicker on this PR.

Overall I think the approach achieved basic functionality we discussed previously. the remaining work missing were mostly with

  1. Java side LanceDB/Core CI release and also getting it e2e working on a linux/x86 env
  2. getting it working with trino's CI framework

for step 1 I think @eddyxu has made great progress merging https://github.com/lancedb/lance/pull/2382 and https://github.com/lancedb/lance/pull/2516. i will be testing this next to ensure it works E2E. for step 2 we still have some complication regarding the GHA image used so any suggestion on how to get this sorted out would be highly appreciated.

Thanks @mosabua and other folks for the help and support. I think code-wise the PR is ready for review. will keep folks updated on CI and packaging side

walterddr avatar Jul 14 '24 14:07 walterddr

I think we want to get this changed from draft to ready for review then @walterddr

In parallel we can work on CI and packaging.

Can you maybe let us know what the status is about building locally and what the specific issues are about CI and packaging.

mosabua avatar Jul 15 '24 19:07 mosabua

Also note this connector might end up being part of the work on refactoring packaging. Note that the roadmap issue is still in the works to get fleshed out and implementation to begin.

https://github.com/trinodb/trino/issues/22597

mosabua avatar Jul 15 '24 19:07 mosabua

I think we want to get this changed from draft to ready for review then @walterddr

DONE

In parallel we can work on CI and packaging.

Can you maybe let us know what the status is about building locally and what the specific issues are about CI and packaging.

I can build locally on either my own linux-x86 or my mac platform but i think the 1st goal here is to make linux-x86 work. previously when i launch the CI job the issue is with the ubuntu-22.04 image we used in CI that has a GLIB version issue

java.lang.UnsatisfiedLinkError: /tmp/liblancedb_jni15263364499459730028.so: /lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.38' not found (required by /tmp/liblancedb_jni15263364499459730028.so)

see link: https://github.com/trinodb/trino/actions/runs/9331728631/job/25686753922?pr=21880

walterddr avatar Jul 18 '24 00:07 walterddr

Do you know what base image would actually fix that?

mosabua avatar Jul 18 '24 22:07 mosabua

Do you know what base image would actually fix that?

I am not 100% sure. according to the log shown in the run. the base image is:

Ubuntu 22.0.4 LTS

so it should come packaged with GLIBC version 2.35. the error message indicate that lance-db JNI JAR is looking for

/lib/x86_64-linux-gnu/libm.so.6: version `GLIBC_2.38' not found

so i think there's 2 options:

  1. upgrade GLIBC on this particular run to GLIBC_2.38 using an extra step in the ci run.
  2. downgrade GLIBC version to 2.35 in lance-db release

CC @LuQQiu and @eddyxu for suggestions from the lance side

walterddr avatar Jul 30 '24 14:07 walterddr

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Aug 21 '24 17:08 github-actions[bot]

@wendigo do you think we should upgrade our base image?

@walterddr is glib version this a runtime requirement as well?

mosabua avatar Aug 21 '24 17:08 mosabua

@mosabua sorry for the delay, looks like i was looking at the wrong place, the issue comes with lancedb-core module compiled not lance-core module. thus our previous method to work around it was always aiming at the wrong target.

after chatting with lance folks we have a new version of lancedb-core and it should conform with the current base image so i dont think we need to bump it up.

and yes it is a runtime dependency.

walterddr avatar Aug 23 '24 01:08 walterddr

This pull request has gone a while without any activity. Tagging the Trino developer relations team: @bitsondatadev @colebow @mosabua

github-actions[bot] avatar Sep 13 '24 17:09 github-actions[bot]

Is this feature already available?

Firstero avatar Oct 21 '24 09:10 Firstero

@walterddr Any blocker for this PR? Many users are asking for this integration

LuQQiu avatar Oct 21 '24 15:10 LuQQiu

Yes .. we need to review it all still and get it ready for merge with @walterddr

mosabua avatar Oct 21 '24 16:10 mosabua