Serverside Rendering on Vercel fails; missing GLIBC_2.29
What happens?
When attempting to deploy some Javascript project to Vercel that leverages SSR and DuckDB; the build fails.
The error message being presented by DuckDB is /lib64/libm.so.6: version 'GLIBC_2.29' not found (required by /vercel/path0/node_modules/duckdb/lib/binding/duckdb.node.
This has worked previously.
To Reproduce
This repo has a simple reproduction of the issue; simply create a vercel project based on this (or a fork), and the build will fail with the error message https://github.com/ItsMeBrianD/duckdb-vercel-repro
OS:
Vercel
DuckDB Version:
0.7.1
DuckDB Client:
node
Full Name:
Brian Donald
Affiliation:
Evidence
Have you tried this on the latest master branch?
- [X] I agree
Have you tried the steps to reproduce? Do they include all relevant data and configuration? Does the issue you report still appear there?
- [X] I agree
@Mause we're using the NodeJS client - we're not sure, but perhaps this is new in 0.7.1?
@Mause we're using the NodeJS client - we're not sure, but perhaps this is new in 0.7.1?
Which version does it work with? We can check for changes
As Vercel is running on AWS Lambda as far as I know, I'm having a hard time imagining that this has worked before, as Lambda environments are currently based on Amazon Linux 2, which uses GLIBC 2.26. See https://repost.aws/questions/QUrXOioL46RcCnFGyELJWKLw/glibc-2-27-on-amazon-linux-2
I guess you could download my DuckDB for Lambda layer, and extract the build artifacts: https://github.com/tobilg/duckdb-nodejs-layer#arns
Experiencing similar error on Vercel with both node 18.x and 16.x.
https://github.com/pgzmnk/openb
I therefor created https://www.npmjs.com/package/duckdb-lambda-x86 which should solve the actual issue.
@Mause we're using the NodeJS client - we're not sure, but perhaps this is new in 0.7.1?
Which version does it work with? We can check for changes
@archiewood any updates?
I've encountered the same problem as described. Specifically, I'm using [email protected].
Environment:
- Operating System: Ubuntu 22.02 and Mac M1 Sonoma
- Encountered inside a Docker container
- Docker Base Image:
node:14
Steps to Reproduce:
docker run --rm -it node:14 bash
In node:14 container
mkdir app && cd app
yarn init -y
yarn add [email protected]
cd node_modules/duckdb
npm test
Are there any necessary packages that I need to install?
Tranlated by ChatGPT.
Sorry for my english is not good. I hope there's no offense.
@hanshino the default duckdb npm package will not work IMO due to GLIBC incompatibilities, as described above. For Lambda usage, I maintain the https://www.npmjs.com/package/duckdb-lambda-x86 package which should fix your issues.
Here's a wrapper over duckdb-async and duckdb-lambda-x86 that I just wrote, which seems to work both on my M1 macbook (which requires duckdb-async) and on an EC2 instance where I was previously hitting the GLIBC_2.29 error (where duckdb-lambda-x86 works instead):
// lib/duckdb.ts
let _query: Promise<(query: string) => any>
_query = import("duckdb-async")
.then(duckdb => duckdb.Database)
.then(Database => Database.create(":memory:"))
.then((db: any) => ((query: string) => db.all(query)))
.catch(async error => {
console.log("duckdb init error:", error)
let duckdb = await import("duckdb-lambda-x86");
let Database: any = await duckdb.Database;
const db = new Database(":memory:")
const connection = db.connect()
return (query: string) => {
return new Promise((resolve, reject) => {
connection.all(query, (err: any, res: any) => {
if (err) reject(err);
resolve(res);
})
})
}
})
export { _query }
Sample API endpoint that uses it:
// /api/query.ts
import { _query } from "@/lib/duckdb"
import { NextApiRequest, NextApiResponse } from "next";
// Convert BigInts to numbers
function replacer(key: string, value: any) {
if (typeof value === 'bigint') {
return Number(value)
} else {
return value;
}
}
export default async function handler(
req: NextApiRequest,
res: NextApiResponse,
) {
const { body: { path } } = req
const query = await _query
const rows = await query(`select * from read_parquet("${path}")`) // 🚨 unsafe / SQLi 🚨
res.status(200).send(JSON.stringify(rows, replacer))
}
FYI for others who run into this. I ended up using @tobilg's duckdb-lambda-x86 to resolve this with Vercel. In my case I'm just replacing the default duckdb.node binary with the duckdb-lambda-x86 version in the CI build.
@michaelwallabi Thank you for the tip - replacing the binary at build/deploy time was by far the most ergonomic solution (and the only one that I was able to work for my project). I want to extend my sincere appreciation to @tobilg for the effort that enabled it in the first place as well.
Ideally running duck db in a lambda should be easy out of the box as it is a great use case, so I look forward to future releases that don't require hacks/workarounds.
Even with replacing the binaries, I am getting the following issue on version 1.0.0. (I am on Vercel, Nodejs 20)
Unhandled Rejection: [Error: IO Error: Can't find the home directory at '' Specify a home directory using the SET home_directory='/path/to/dir' option.] { errno: -1, code: 'DUCKDB_NODEJS_ERROR', errorType: 'IO' }
Setting a homedirectory does also result in an error: Error: TypeError: Failed to set configuration option home_directory: Invalid Input Error: Could not set option "home_directory" as a global option at new Database (/var/task/node_modules/duckdb-async/dist/duckdb-async.js:226:19)
Can anyone help me please? Thank you!
@Dev-rick this worked for me on aws lambda!
https://github.com/tobilg/serverless-duckdb/blob/87ad3c5d1bbbb8e03a80e6ad943da53c3a556a21/src/functions/query.ts#L73
Like @iku000888, I do the following when creating a DB, which seems to work:
const db = Database.create(":memory:");
let tempDirectory = tmpdir() || '/tmp';
await (await db).exec(`
SET home_directory='${tempDirectory}';
.... other settings here
`);
```
@iku000888 and @michaelwallabi Thanks for the input!
Unfortunately I am now getting the following error (on Vercel), on local everything works fine with the same env variables.
Error: HTTP Error: HTTP GET error on 'https://XXX.s3.amazonaws.com/XXX.parquet' (HTTP 400)] { errno: -1, code: 'DUCKDB_NODEJS_ERROR', errorType: 'HTTP' }
My code is:
const S3_LAKE_BUCKET_NAME = process.env.S3_LAKE_BUCKET_NAME
const AWS_S3_ACCESS_KEY = process.env['AWS_S3_ACCESS_KEY']
const AWS_S3_SECRET_KEY = process.env['AWS_S3_SECRET_KEY']
const AWS_S3_REGION = process.env['AWS_S3_REGION']
const retrieveDataFromParquet = async ({
key,
sqlStatement,
tableName,
}: {
key: string
sqlStatement: string
tableName: string
}) => {
try {
// Create a new DuckDB database connection
const db = await Database.create(':memory:')
console.log('Setting home directory...')
await db.all(`SET home_directory='/tmp';`)
console.log('Installing and loading httpfs extension...')
await db.all(`
INSTALL httpfs;
LOAD httpfs;
`)
console.log('Setting S3 credentials...')
await db.all(`
SET s3_region='${AWS_S3_REGION}';
SET s3_access_key_id='${AWS_S3_ACCESS_KEY}';
SET s3_secret_access_key='${AWS_S3_SECRET_KEY}';
`)
// Test S3 access
console.log('Testing S3 access...')
try {
const testResult = await db.all(`
SELECT * FROM parquet_metadata('s3://${S3_LAKE_BUCKET_NAME}/${key}');
`)
console.log('S3 access test result successfully loaded:')
} catch (s3Error) {
console.error('Error testing S3 access:', s3Error)
throw s3Error // Rethrow the error to stop execution
}
// Try to read file info without actually reading the file
console.log('Checking file info...')
try {
const fileInfo = await db.all(`
SELECT * FROM parquet_scan('s3://${S3_LAKE_BUCKET_NAME}/${key}') LIMIT 0;
`)
console.log('File info loaded')
} catch (fileError) {
console.error('Error checking file info:', fileError)
}
// If everything above works, try creating the table
console.log('Creating table...')
await db.all(
`CREATE TABLE ${tableName} AS SELECT * FROM parquet_scan('s3://${S3_LAKE_BUCKET_NAME}/${key}');`,
)
console.log('Table created successfully')
// Execute the query
const result = db.all(sqlStatement)
// Close the database connection
db.close()
// Send the result
return result as unknown as Promise<{ [k: string]: any }[]>
} catch (error) {
console.error('Error:', error)
return null
}
}
Have a look at my implementation at https://github.com/tobilg/serverless-duckdb/blob/main/src/lib/awsSecret.ts and triggering https://github.com/tobilg/serverless-duckdb/blob/main/src/functions/queryS3Express.ts#L95 before any access to S3.
Hint: IMO you also need to pass the SESSION_TOKEN and eventually the ENDPOINT as well if you're using S3 One-Zone Express.
I'm wondering why you're seeing a 400 status (invalid request), and not a 403 status though.
@michaelwallabi Thank you for the tip - replacing the binary at build/deploy time was by far the most ergonomic solution (and the only one that I was able to work for my project). I want to extend my sincere appreciation to @tobilg for the effort that enabled it in the first place as well.
Thank you, appreciate the feedback!
Ideally running duck db in a lambda should be easy out of the box as it is a great use case, so I look forward to future releases that don't require hacks/workarounds.
This is honestly not a "fault" from DuckDB, but from AWS using very outdated GLIBC versions in any Node runtimes before Node 20 (see https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html#runtimes-supported), as Node 20 now uses AL 2023 which has a updated GLIBC that should work with the normal duckdb-node package as well afaik.
This is honestly not a "fault" from DuckDB, but from AWS using very outdated GLIBC versions in any Node runtimes before Node 20 (see https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html#runtimes-supported), as Node 20 now uses AL 2023 which has a updated GLIBC that should work with the normal duckdb-node package as well afaik.
Oh hm that is interesting. I thought I was running my lambdas on Node 20 and was getting ELF errors, so either AL 2023 still has issues or I'm not on Node 20 🤔