duckdb-node Serverside Rendering on Vercel fails; missing GLIBC

What happens?

When attempting to deploy some Javascript project to Vercel that leverages SSR and DuckDB; the build fails.

The error message being presented by DuckDB is /lib64/libm.so.6: version 'GLIBC_2.29' not found (required by /vercel/path0/node_modules/duckdb/lib/binding/duckdb.node.

This has worked previously.

To Reproduce

This repo has a simple reproduction of the issue; simply create a vercel project based on this (or a fork), and the build will fail with the error message https://github.com/ItsMeBrianD/duckdb-vercel-repro

OS:

Vercel

DuckDB Version:

0.7.1

DuckDB Client:

node

Full Name:

Brian Donald

Affiliation:

Evidence

Have you tried this on the latest `master` branch?

[X] I agree

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

[X] I agree

Apr 14 '23 15:04 ItsMeBrianD

@Mause we're using the NodeJS client - we're not sure, but perhaps this is new in 0.7.1?

Apr 14 '23 15:04 archiewood

@Mause we're using the NodeJS client - we're not sure, but perhaps this is new in 0.7.1?

Which version does it work with? We can check for changes

Apr 15 '23 01:04 Mause

As Vercel is running on AWS Lambda as far as I know, I'm having a hard time imagining that this has worked before, as Lambda environments are currently based on Amazon Linux 2, which uses GLIBC 2.26. See https://repost.aws/questions/QUrXOioL46RcCnFGyELJWKLw/glibc-2-27-on-amazon-linux-2

I guess you could download my DuckDB for Lambda layer, and extract the build artifacts: https://github.com/tobilg/duckdb-nodejs-layer#arns

Apr 17 '23 12:04 tobilg

Experiencing similar error on Vercel with both node 18.x and 16.x.

https://github.com/pgzmnk/openb

Jul 24 '23 23:07 pgzmnk

I therefor created https://www.npmjs.com/package/duckdb-lambda-x86 which should solve the actual issue.

Sep 27 '23 05:09 tobilg

@Mause we're using the NodeJS client - we're not sure, but perhaps this is new in 0.7.1?

Which version does it work with? We can check for changes

@archiewood any updates?

Oct 17 '23 08:10 Mause

I've encountered the same problem as described. Specifically, I'm using [email protected].

Environment:

Operating System: Ubuntu 22.02 and Mac M1 Sonoma
Encountered inside a Docker container
Docker Base Image: node:14

Steps to Reproduce:

docker run --rm -it node:14 bash

In node:14 container

mkdir app && cd app
yarn init -y
yarn add [email protected]
cd node_modules/duckdb
npm test

Are there any necessary packages that I need to install?

Tranlated by ChatGPT.

Sorry for my english is not good. I hope there's no offense.

Oct 21 '23 16:10 hanshino

@hanshino the default duckdb npm package will not work IMO due to GLIBC incompatibilities, as described above. For Lambda usage, I maintain the https://www.npmjs.com/package/duckdb-lambda-x86 package which should fix your issues.

Oct 21 '23 16:10 tobilg

Here's a wrapper over duckdb-async and duckdb-lambda-x86 that I just wrote, which seems to work both on my M1 macbook (which requires duckdb-async) and on an EC2 instance where I was previously hitting the GLIBC_2.29 error (where duckdb-lambda-x86 works instead):

// lib/duckdb.ts
let _query: Promise<(query: string) => any>

_query = import("duckdb-async")
    .then(duckdb => duckdb.Database)
    .then(Database => Database.create(":memory:"))
    .then((db: any) => ((query: string) => db.all(query)))
    .catch(async error => {
        console.log("duckdb init error:", error)
        let duckdb = await import("duckdb-lambda-x86");
        let Database: any = await duckdb.Database;
        const db = new Database(":memory:")
        const connection = db.connect()
        return (query: string) => {
            return new Promise((resolve, reject) => {
                connection.all(query, (err: any, res: any) => {
                    if (err) reject(err);
                    resolve(res);
                })
            })
        }
    })

export { _query }

Sample API endpoint that uses it:

// /api/query.ts
import { _query } from "@/lib/duckdb"
import { NextApiRequest, NextApiResponse } from "next";

// Convert BigInts to numbers
function replacer(key: string, value: any) {
    if (typeof value === 'bigint') {
        return Number(value)
    } else {
        return value;
    }
}

export default async function handler(
    req: NextApiRequest,
    res: NextApiResponse,
) {
    const { body: { path } } = req
    const query = await _query
    const rows = await query(`select * from read_parquet("${path}")`)  // 🚨 unsafe / SQLi 🚨
    res.status(200).send(JSON.stringify(rows, replacer))
}

Nov 17 '23 03:11 ryan-williams

FYI for others who run into this. I ended up using @tobilg's duckdb-lambda-x86 to resolve this with Vercel. In my case I'm just replacing the default duckdb.node binary with the duckdb-lambda-x86 version in the CI build.

Feb 19 '24 18:02 michaelwallabi

@michaelwallabi Thank you for the tip - replacing the binary at build/deploy time was by far the most ergonomic solution (and the only one that I was able to work for my project). I want to extend my sincere appreciation to @tobilg for the effort that enabled it in the first place as well.

Ideally running duck db in a lambda should be easy out of the box as it is a great use case, so I look forward to future releases that don't require hacks/workarounds.

Aug 31 '24 05:08 iku000888

Even with replacing the binaries, I am getting the following issue on version 1.0.0. (I am on Vercel, Nodejs 20)

Unhandled Rejection: [Error: IO Error: Can't find the home directory at '' Specify a home directory using the SET home_directory='/path/to/dir' option.] { errno: -1, code: 'DUCKDB_NODEJS_ERROR', errorType: 'IO' }

Setting a homedirectory does also result in an error: Error: TypeError: Failed to set configuration option home_directory: Invalid Input Error: Could not set option "home_directory" as a global option at new Database (/var/task/node_modules/duckdb-async/dist/duckdb-async.js:226:19)

Can anyone help me please? Thank you!

Sep 16 '24 22:09 Dev-rick

@Dev-rick this worked for me on aws lambda!

https://github.com/tobilg/serverless-duckdb/blob/87ad3c5d1bbbb8e03a80e6ad943da53c3a556a21/src/functions/query.ts#L73

Sep 16 '24 22:09 iku000888

Like @iku000888, I do the following when creating a DB, which seems to work:

    const db = Database.create(":memory:");
    let tempDirectory = tmpdir() || '/tmp';
    await (await db).exec(`
        SET home_directory='${tempDirectory}';
        .... other settings here
        `);
    ```

Sep 16 '24 23:09 michaelwallabi

@iku000888 and @michaelwallabi Thanks for the input!

Unfortunately I am now getting the following error (on Vercel), on local everything works fine with the same env variables.

Error: HTTP Error: HTTP GET error on 'https://XXX.s3.amazonaws.com/XXX.parquet' (HTTP 400)] { errno: -1, code: 'DUCKDB_NODEJS_ERROR', errorType: 'HTTP' }

My code is:

const S3_LAKE_BUCKET_NAME = process.env.S3_LAKE_BUCKET_NAME
const AWS_S3_ACCESS_KEY = process.env['AWS_S3_ACCESS_KEY']
const AWS_S3_SECRET_KEY = process.env['AWS_S3_SECRET_KEY']
const AWS_S3_REGION = process.env['AWS_S3_REGION']

const retrieveDataFromParquet = async ({
  key,
  sqlStatement,
  tableName,
}: {
  key: string
  sqlStatement: string
  tableName: string
}) => {
  try {
    // Create a new DuckDB database connection
    const db = await Database.create(':memory:')

    console.log('Setting home directory...')
    await db.all(`SET home_directory='/tmp';`)

    console.log('Installing and loading httpfs extension...')
    await db.all(`
      INSTALL httpfs;
      LOAD httpfs;
    `)

    console.log('Setting S3 credentials...')
    await db.all(`
      SET s3_region='${AWS_S3_REGION}';
      SET s3_access_key_id='${AWS_S3_ACCESS_KEY}';
      SET s3_secret_access_key='${AWS_S3_SECRET_KEY}';
    `)

    // Test S3 access
    console.log('Testing S3 access...')
    try {
      const testResult = await db.all(`
        SELECT * FROM parquet_metadata('s3://${S3_LAKE_BUCKET_NAME}/${key}');
      `)
      console.log('S3 access test result successfully loaded:')
    } catch (s3Error) {
      console.error('Error testing S3 access:', s3Error)
      throw s3Error // Rethrow the error to stop execution
    }

    // Try to read file info without actually reading the file
    console.log('Checking file info...')
    try {
      const fileInfo = await db.all(`
        SELECT * FROM parquet_scan('s3://${S3_LAKE_BUCKET_NAME}/${key}') LIMIT 0;
      `)
      console.log('File info loaded')
    } catch (fileError) {
      console.error('Error checking file info:', fileError)
    }

    // If everything above works, try creating the table
    console.log('Creating table...')
    await db.all(
      `CREATE TABLE ${tableName} AS SELECT * FROM parquet_scan('s3://${S3_LAKE_BUCKET_NAME}/${key}');`,
    )

    console.log('Table created successfully')

    // Execute the query
    const result = db.all(sqlStatement)

    // Close the database connection
    db.close()

    // Send the result
    return result as unknown as Promise<{ [k: string]: any }[]>
  } catch (error) {
    console.error('Error:', error)
    return null
  }
}

Sep 17 '24 06:09 Dev-rick

Have a look at my implementation at https://github.com/tobilg/serverless-duckdb/blob/main/src/lib/awsSecret.ts and triggering https://github.com/tobilg/serverless-duckdb/blob/main/src/functions/queryS3Express.ts#L95 before any access to S3.

Hint: IMO you also need to pass the SESSION_TOKEN and eventually the ENDPOINT as well if you're using S3 One-Zone Express.

I'm wondering why you're seeing a 400 status (invalid request), and not a 403 status though.

Sep 17 '24 08:09 tobilg

@michaelwallabi Thank you for the tip - replacing the binary at build/deploy time was by far the most ergonomic solution (and the only one that I was able to work for my project). I want to extend my sincere appreciation to @tobilg for the effort that enabled it in the first place as well.

Thank you, appreciate the feedback!

Ideally running duck db in a lambda should be easy out of the box as it is a great use case, so I look forward to future releases that don't require hacks/workarounds.

This is honestly not a "fault" from DuckDB, but from AWS using very outdated GLIBC versions in any Node runtimes before Node 20 (see https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html#runtimes-supported), as Node 20 now uses AL 2023 which has a updated GLIBC that should work with the normal duckdb-node package as well afaik.

Sep 17 '24 09:09 tobilg

This is honestly not a "fault" from DuckDB, but from AWS using very outdated GLIBC versions in any Node runtimes before Node 20 (see https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html#runtimes-supported), as Node 20 now uses AL 2023 which has a updated GLIBC that should work with the normal duckdb-node package as well afaik.

Oh hm that is interesting. I thought I was running my lambdas on Node 20 and was getting ELF errors, so either AL 2023 still has issues or I'm not on Node 20 🤔

Sep 17 '24 23:09 iku000888

Serverside Rendering on Vercel fails; missing GLIBC_2.29

What happens?

To Reproduce

OS:

DuckDB Version:

DuckDB Client:

Full Name:

Affiliation:

Have you tried this on the latest master branch?

Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?

Have you tried this on the latest `master` branch?