NIF Panic when reading parquet files from S3
Code
Explorer.DataFrame.from_parquet("s3://path/to/file.parquet", config: %{FSS.S3.config_from_system_env() | region: "us-west-2"})
Expected
A working DataFrame
Actual
** (ErlangError) Erlang error: :nif_panicked
(explorer 0.10.1) Explorer.PolarsBackend.Native.lf_compute(%Explorer.PolarsBackend.LazyFrame{resource: #Reference<0.1199058728.3890348049.170513>})
(explorer 0.10.1) lib/explorer/polars_backend/data_frame.ex:286: Explorer.PolarsBackend.DataFrame.from_parquet/4
iex:1: (file)
Note: When I try to load the parquet file lazily, I get a more detailed stacktrace:
Explorer.DataFrame.from_parquet("s3://path/to/file.parquet", config: %{FSS.S3.config_from_system_env() | region: "us-west-2"}, lazy: true)
#Inspect.Error<
got ErlangError with message:
"""
Erlang error: :nif_panicked
"""
while inspecting:
%{
data: %Explorer.PolarsBackend.LazyFrame{
resource: #Reference<0.3078565455.1145176087.107417>
},
remote: nil,
names: ["id", "point", "rarity", "type"],
struct: Explorer.DataFrame,
groups: [],
dtypes: %{
"id" => :string,
"point" => :string,
"rarity" => :string,
"type" => :string
}
}
Stacktrace:
(explorer 0.10.1) Explorer.PolarsBackend.Native.lf_fetch(%Explorer.PolarsBackend.LazyFrame{resource: #Reference<0.3078565455.1145176087.107417>}, 50)
(explorer 0.10.1) lib/explorer/polars_backend/lazy_frame.ex:74: Explorer.PolarsBackend.LazyFrame.inspect/2
(explorer 0.10.1) lib/explorer/data_frame.ex:6379: Inspect.Explorer.DataFrame.inspect/2
(elixir 1.16.1) lib/inspect/algebra.ex:347: Inspect.Algebra.to_doc/2
(elixir 1.16.1) lib/kernel.ex:2351: Kernel.inspect/2
(iex 1.16.1) lib/iex/evaluator.ex:376: IEx.Evaluator.io_inspect/1
(iex 1.16.1) lib/iex/evaluator.ex:335: IEx.Evaluator.eval_and_inspect/3
(iex 1.16.1) lib/iex/evaluator.ex:306: IEx.Evaluator.eval_and_inspect_parsed/3
>
Context
This only happens when I deploy to staging or prod (using Docker with the base being the elixir-1.18.1 image & mix releases). It works perfectly when I'm developing locally (on Mac OS)
Can you execute any other operation? If nothing works, then it is most likely incompatible gcc/musl versions, you can check the README information on precompilation: https://github.com/elixir-explorer/explorer?tab=readme-ov-file#precompilation
I tested two different operations, with one succeeding and one resulting in the same NIF panic:
First test was the example used in #1011
Mix.install([{:explorer, "~> 0.10.0"}])
name_dtype = {"names",
{:list,
{:struct,
[
{"language", :string},
{"name", :string},
{"transliteration", :category},
{"type", :category}
]}}}
[
%{names: []},
%{names: [%{name: "CABK", type: "acronym", language: nil, transliteration: "none"}]}
]
|> Explorer.DataFrame.new(dtypes: [name_dtype])
|> dbg
Which resulted in a NIF panic:
[iex:6: (file)]
[
%{names: []},
%{names: [%{name: "CABK", type: "acronym", language: nil, transliteration: "none"}]}
] #=> [
%{names: []},
%{
names: [
%{name: "CABK", type: "acronym", language: nil, transliteration: "none"}
]
}
]
|> Explorer.DataFrame.new(dtypes: [name_dtype]) #=> #Inspect.Error<
got ErlangError with message:
"""
Erlang error: :nif_panicked
"""
while inspecting:
%{
data: %Explorer.PolarsBackend.DataFrame{
resource: #Reference<0.3723385053.1587150849.9786>
},
remote: nil,
names: ["names"],
__struct__: Explorer.DataFrame,
groups: [],
dtypes: %{
"names" => {:list,
{:struct,
[
{"language", :string},
{"name", :string},
{"transliteration", :category},
{"type", :category}
]}}
}
}
Stacktrace:
(explorer 0.10.1) Explorer.PolarsBackend.Native.s_to_list(#Explorer.PolarsBackend.Series<
#Reference<0.3723385053.1586364425.234579>
>)
(explorer 0.10.1) lib/explorer/polars_backend/shared.ex:24: Explorer.PolarsBackend.Shared.apply_series/3
(explorer 0.10.1) lib/explorer/backend/data_frame.ex:324: anonymous fn/3 in Explorer.Backend.DataFrame.build_cols_algebra/3
(elixir 1.18.1) lib/enum.ex:1714: Enum."-map/2-lists^map/1-1-"/2
(explorer 0.10.1) lib/explorer/backend/data_frame.ex:283: Explorer.Backend.DataFrame.inspect/5
(explorer 0.10.1) lib/explorer/data_frame.ex:6379: Inspect.Explorer.DataFrame.inspect/2
(elixir 1.18.1) lib/inspect/algebra.ex:348: Inspect.Algebra.to_doc/2
(elixir 1.18.1) lib/kernel.ex:2376: Kernel.inspect/2
>
The second operation I tried was creating a simple dataframe and that succeeded:
df = Explorer.DataFrame.new(%{
"id" => ["a", "b", "c"],
"type" => ["x", "y", "z"]
})
Output:
#Explorer.DataFrame<
Polars[3 x 2]
id string ["a", "b", "c"]
type string ["x", "y", "z"]
>
I'm deploying using Mix releases and Docker with the base image being elixir-1.18.1
I also verified that during the build processing I'm correctly downloading the precompiled NIF:
[debug] Downloading NIF from https://github.com/elixir-nx/explorer/releases/download/v0.10.1/libexplorer-v0.10.1-nif-2.15-x86_64-unknown-linux-gnu.so.tar.gz
@rohfosho Thank you for the additional info. Can you possibly share a dataframe which exhibits the panic you originally saw? #1011 is still an open issue, so it panicking is expected.
@billylanchantin No problem! My use case is that I'm currently trying to read a parquet file straight from s3 that has 4 columns
%{
"id" => :string,
"point" => :string,
"rarity" => :string,
"type" => :string
}
I see that Explorer is able to pull it down and see the different columns but doesn't make it past that. I can send over a sample file if that's helpful!
Yeah that'd be great, thanks!
@billylanchantin github won't let me upload parquet files here, cool if I DM you on the Elixir slack?
Just sent it via Slack. Let me know if you prefer something else and I can upload to google drive!
Ok I got the file and some more info off slack.
from_parquet: works on their local machine but not in prodload_parquet: works on their local machine and in prod
So they're technically unblocked right now since they can use load_parquet instead. But the bug is still there.
As a sanity check, I ran our setup-localstack.sh and uploaded the file to a local amazon-ec2-metadata-mock container (like we do with our wine dataset). I ran a modified version of our S3 test:
@tag :cloud_integration
test "reads rohfosho's parquet file from S3" do
config = %FSS.S3.Config{
access_key_id: "test",
secret_access_key: "test",
endpoint: "http://localhost:4566",
region: "us-east-1"
}
assert {:ok, df} =
DF.from_parquet("s3://test-bucket/rohfosho.parquet",
config: config,
)
df |> DF.print()
end
which passed. Must be something more specific, IDK yet. @josevalim any ideas?
Hey, I suspect that this may be some library missing inside the container. Can you print the result of the following command?
ldd -v /path/to/the/extracted/lib.so
Where this path is printed right after you install Explorer - there is a small bug though: the path should not end with tar.gz, so just omit it and it will work fine.
@philss sure! here you go
ldd -v _build/prod/rel/oracle/lib/explorer-0.10.1/priv/native/libexplorer-v0.10.1-nif-2.15-x86_64-unknown-linux-gnu.so
linux-vdso.so.1 (0x00007ffd65942000)
libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007baebffe2000)
libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007baebffdd000)
libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007baebfefe000)
libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007baebfef9000)
libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007baebfd18000)
/lib64/ld-linux-x86-64.so.2 (0x00007baec3f42000)
Version information:
_build/prod/rel/oracle/lib/explorer-0.10.1/priv/native/libexplorer-v0.10.1-nif-2.15-x86_64-unknown-linux-gnu.so:
libgcc_s.so.1 (GCC_3.0) => /lib/x86_64-linux-gnu/libgcc_s.so.1
libgcc_s.so.1 (GCC_3.3) => /lib/x86_64-linux-gnu/libgcc_s.so.1
libgcc_s.so.1 (GCC_4.2.0) => /lib/x86_64-linux-gnu/libgcc_s.so.1
libpthread.so.0 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libpthread.so.0
libpthread.so.0 (GLIBC_2.12) => /lib/x86_64-linux-gnu/libpthread.so.0
libm.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libm.so.6
libm.so.6 (GLIBC_2.27) => /lib/x86_64-linux-gnu/libm.so.6
libm.so.6 (GLIBC_2.29) => /lib/x86_64-linux-gnu/libm.so.6
libdl.so.2 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libdl.so.2
libc.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.3) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.3.2) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.3.4) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.4) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.6) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.7) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.9) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.14) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.17) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.18) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.25) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.28) => /lib/x86_64-linux-gnu/libc.so.6
ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
/lib/x86_64-linux-gnu/libgcc_s.so.1:
libc.so.6 (GLIBC_2.35) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.14) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.34) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libc.so.6
/lib/x86_64-linux-gnu/libpthread.so.0:
libc.so.6 (GLIBC_ABI_DT_RELR) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libc.so.6
/lib/x86_64-linux-gnu/libm.so.6:
ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
libc.so.6 (GLIBC_ABI_DT_RELR) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.4) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_PRIVATE) => /lib/x86_64-linux-gnu/libc.so.6
/lib/x86_64-linux-gnu/libdl.so.2:
libc.so.6 (GLIBC_ABI_DT_RELR) => /lib/x86_64-linux-gnu/libc.so.6
libc.so.6 (GLIBC_2.2.5) => /lib/x86_64-linux-gnu/libc.so.6
/lib/x86_64-linux-gnu/libc.so.6:
ld-linux-x86-64.so.2 (GLIBC_2.35) => /lib64/ld-linux-x86-64.so.2
ld-linux-x86-64.so.2 (GLIBC_2.2.5) => /lib64/ld-linux-x86-64.so.2
ld-linux-x86-64.so.2 (GLIBC_2.3) => /lib64/ld-linux-x86-64.so.2
ld-linux-x86-64.so.2 (GLIBC_PRIVATE) => /lib64/ld-linux-x86-64.so.2
@rohfosho thank you for the info! I think that's nothing wrong there, based on what you sent. Would you mind to share the Dockerfile, or the full base image tag that you are using? It may be easier to reproduce.
@philss for sure, here you go:
# syntax = docker/dockerfile:1.2
# Use the official Elixir image as the base image
FROM elixir:1.18.1 AS builder
# Set the working directory inside the container
WORKDIR /app
# Install required system dependencies
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
build-essential \
git \
nodejs \
npm \
postgresql-client \
python3
RUN mix local.hex --force && \
mix local.rebar --force
# Copy the mix files first for better docker build caching
COPY mix.exs ./
COPY mix.lock ./
RUN mix deps.get
# Compile the dependencies, set MIX_ENV beforehand
ARG MIX_ENV=prod
ENV MIX_ENV=${MIX_ENV}
RUN mix deps.compile
# Now copy the whole project to avoid rebuilding the deps when the source code changes
COPY . .
# Set permissions for the release script
RUN chmod +x release.sh
# Source Code Compilation Stage
FROM builder AS compiler
# Set the working directory
WORKDIR /app
# Execute release script
RUN --mount=type=secret,id=_env,dst=/etc/secrets/.env ./release.sh
# New stage for the runtime image to reduce the final size
FROM elixir:1.18.1
# Set the working directory
WORKDIR /app
# Install minimal dependencies
RUN apt-get update -y && \
apt-get install -y --no-install-recommends \
nodejs \
npm \
postgresql-client \
python3
COPY --from=compiler /app/ .
# Expose the port the application will run on
EXPOSE 4000
# Define the entrypoint for the application
ENTRYPOINT ["/app/_build/${MIX_ENV}/rel/my_app/bin/my_app"]
# Start the Phoenix application
CMD ["start"]
@rohfosho sorry for the delay. I built a container image and ran the code, but I couldn't reproduce the problem. I'm running in a Linux environment (Fedora 41 - w/ Podman). If you don't mind, can you run your code with the EXPLORER_USE_LEGACY_ARTIFACTS env var configured to "true"? This might be something related to legacy CPUs.
Another shot would be to try to compile from source, from our main branch, and see if the problem persists. We updated Polars recently, so it may be working.
When reading from parquet file, passing rechunk: true fixed this error for me