ibis
ibis copied to clipboard
bug(bigquery): different name of column in sql code and table.columns
What happened?
Here is the code could be used to reproduce the error:
import ibis
ibis.options.interactive = True
con = ibis.bigquery.connect(project_id="xxx", dataset_id="xxx")
t = ibis.memtable({
"x": ["a", "b"],
"y": ["0", "1"]
})
con.create_table(
"t", t.to_pyarrow(), overwrite=True
)
t = con.table("t")
t = t.mutate(
x1=ibis.literal("c"),
y=_.y.cast(dt.int32),
)
x = t.drop("y")
y = t.y.cast(dt.int64)
when I run
y.as_table().columns
I got ['Cast(y, int64)']
But the name in sql code is different in ibis.to_sql(y)
- name in sql code is Cast_y_ int64
SELECT
CAST(CAST(`t0`.`y` AS INT64) AS INT64) AS `Cast_y_ int64`
FROM `voltrondata-demo`.`nycflights13`.`t` AS `t0`
This will produce errors in ibisml, becuase ibisml use the table's column name to extract target - https://github.com/ibis-project/ibis-ml/blob/main/ibisml/core.py#L135
import ibisml as ml
step = ml.TargetEncode(["x"])
table, targets, index = ml.core.normalize_table(x, y)
print(targets)
metadata = ml.core.Metadata(targets=targets)
step.fit_table(table,metadata)
step.transform_table(x)
Errror
Invalid field name "Cast(y, int64)_f82688".
Updated the error message*******
400 POST https://bigquery.googleapis.com/upload/bigquery/v2/projects/voltrondata-demo/jobs?uploadType=multipart: Invalid field name "Cast(y, int64)_ef9a46". Fields must contain the allowed characters, and be at most 300 characters long. For allowed characters, please refer to https://cloud.google.com/bigquery/docs/schemas#column_names
Not very sure, we should solve this in ibis or ibisml.
What version of ibis are you using?
9
What backend(s) are you using, if any?
bigquery
Relevant log output
No response
Code of Conduct
- [X] I agree to follow this project's Code of Conduct
Rewrite the problem description with an easy example
Updated the error message*******
400 POST https://bigquery.googleapis.com/upload/bigquery/v2/projects/voltrondata-demo/jobs?uploadType=multipart: Invalid field name "Cast(y, int64)_ef9a46". Fields must contain the allowed characters, and be at most 300 characters long. For allowed characters, please refer to https://cloud.google.com/bigquery/docs/schemas#column_names
seems like Cast(y, int64)_ef9a46
is not a valid name for bigquery, because have letters after number.
are you running against main
? That looks like the bigquery issues I fixed yesterday in #9141 and #9149
oh, no. Let me pull it from main
are you running against main? That looks like the bigquery issues I fixed yesterday in https://github.com/ibis-project/ibis/pull/9141 and https://github.com/ibis-project/ibis/pull/9149
We probably need to see the ibis expression. Column names that look like that should effectively never make it into bigquery.
Can you give a reproducible example that only uses Ibis? We definitely need to see what Ibis APIs you're using to produce this error.
we convert an intermediate table into a memtable
in bis-ml, see link
The intermediate information in the expression tree got lost, unfortunately, the new name "Cast(y, float64)"
is not a valid bigquery column name.
I think it is not an ibis issue, we could fix this in ibisml.
Here is the simplified code to reproduce the error:
import ibis
ibis.options.interactive = True
con = ibis.bigquery.connect(project_id="xx", dataset_id="xx")
t = ibis.memtable({
"x": ["a", "b"],
"y": ["0", "1"]
})
con.create_table(
"t", t.to_pyarrow(), overwrite=True
)
t = con.table("t")
base_table = t.drop("y")
target_table = t.y.cast(dt.float64)
y_m = ibis.memtable(target_table.as_table().to_pyarrow())
y_m = y_m.mutate(x=ibis.literal("a"))
base_table.join(y_m, "x")
And the new name "Cast(y, float64)"
is not a valid name in bigquery.
ibis.to_sql(y_m)
sql
SELECT
"t0"."Cast(y, float64)",
'a' AS "x"
FROM "ibis_pyarrow_memtable_cimh6vdzlvhyxlddji3g6jvugq" AS "t0"
I think that between the whitespace fixes in #9160 and the improved to_sql
for memtables in #9228 that this may not be an issue anymore. I'm going to close it out, but ping if this shows up again!