evidence icon indicating copy to clipboard operation
evidence copied to clipboard

[Bug]: Native parquet source does not work

Open kousun12 opened this issue 1 year ago • 10 comments

Describe the bug

I'm trying to use the native parquet/csv source as described in the docs. My source dir looks like:

sources/worldbank
├── connection.options.yaml
├── connection.yaml
└── economies.parquet

connection.yaml looks like:

# This file was automatically generated
type: csv
name: worldbank
# Advanced Options, you probably don't want to change this
buildOptions:
  batchSize: 1000000

and trying a very minimal index.md:

---
title: Evidence test
---

```sql economies
select * from worldbank.economies
```

ends up in an error:

Data Table
Catalog Error: Table with name economies does not exist!
Did you mean "states"?
LINE 2: SELECT * FROM (select * from worldbank.economies
                                     ^

My npm run sources output:

  [Processing] worldbank
  economies ⚠ No results returned.
-----
  Evaluated sources, saving manifest
  ✅ Done!

Steps to Reproduce

Try a very simple project with a single source, having a single .parquet file in it and try to make a query that uses it.

Logs


System Info


Severity

blocking all usage of Evidence

Additional Information, or Workarounds

No response

kousun12 avatar Jan 06 '25 19:01 kousun12

are you able to supply or point to your economies.parquet file?

archiewood avatar Jan 06 '25 19:01 archiewood

doesn't seem to matter which parquet file i try: here's another simple file that i tried:

https://huggingface.co/datasets/substrate-labs/owid/blob/main/data/owid_co2_data.parquet

kousun12 avatar Jan 06 '25 19:01 kousun12

Did you install the parquet connector and add it to your evidence.config.yaml? (Currently this is a third party connector)

npm install evidence-connector-parquet

We will be moving this to a first party connector very soon

archiewood avatar Jan 06 '25 21:01 archiewood

Still not working.

I started a fresh project using

npx degit evidence-dev/template frontend

then npm i evidence-connector-parquet and add "evidence-connector-parquet": { } to datasources in evidence.config.yaml

then add a dir sources/simple

with a connection.yaml and a foo.parquet

# This file was automatically generated
name: simple
type: parquet
options: {}
npm run sources

> [email protected] sources
> evidence sources

✔ Loading plugins & sources
-----
  [Processing] mypar
[ ! ] Error connecting to datasource mypar: Invalid Input Error: Values were not provided for the following prepared statement parameters:  

kousun12 avatar Jan 06 '25 22:01 kousun12

Can you add the parquet source via the UI in localhost:3000/settings - it looks like you are missing config

archiewood avatar Jan 07 '25 00:01 archiewood

Tried that as well by adding a new parquet source foo from /settings it generates more or less the same yaml files and still ends up with:

npm run sources

> [email protected] sources
> evidence sources

✔ Loading plugins & sources
-----
  [Processing] foo
[ ! ] Error connecting to datasource foo: Invalid Input Error: Values were not provided for the following prepared statement parameters: 
     

kousun12 avatar Jan 07 '25 00:01 kousun12

What's interesting is that the first time i try to add a new parquet source from /settings i name it e.g. bar and click "Confirm Changes" then i see a small error banner at the bottom saying Failed to process bar.null

kousun12 avatar Jan 07 '25 00:01 kousun12

that i think is a bug, but not the issue here

archiewood avatar Jan 07 '25 01:01 archiewood

Are you able to repro? I've tried fresh a few times now, with the latest version. Looking at the code it seems identical to the csv plugin, which works fine for me

kousun12 avatar Jan 07 '25 18:01 kousun12

Were you guys able to fix this ?

I get a similar error when I use clickhouse. The parquet files have the correct query results but I get 'Catalog error' in the frontend

shyamvib avatar Nov 25 '25 05:11 shyamvib