Jiting Xu issues

Results 32 issues of


                                            Jiting Xu

feat: sqlglot for basic check and format

Summary: >in the text-to-SQL task, we should use SQLGlot to: > prettify the SQL > with this, check for basic syntax errors resolve: https://github.com/ibis-project/ibis-birdbrain/issues/48

fix(utils): remove redundant column in train_test_split()

remove redundant column in train_test_split() add tests for columns

bug(steps): handle col with one unique value in Scale* step

similar issue to #118 When scaling, we need to calculate the min and max for ScaleMinMax, or the standard deviation for ScaleStandard. If a column has only one unique value,...

bug

[WIP] docs(examples): add kaggle example

Add example: Home Credit - Credit Risk Model Stability [link](https://www.kaggle.com/competitions/home-credit-credit-risk-model-stability/overview)

bug(steps): handle col with all nulls in impute

For all the impute, we need to handle cols with all nulls. We need to tell the user the column is all nulls by failing the imputing or throw warnings....

feat: support read_parquet for backend with no native support

## Description of changes Support `read_parquet `for backends that do not have native support (like duckdb). This implementation leverages the PyArrow [read_table](https://arrow.apache.org/docs/python/generated/pyarrow.parquet.read_table.html) function. If a backend does not have its...

tests

feat: support read_csv for backends with no native support

## Description of changes For backends that lack native read_csv support, `pyarrow.csv.read_csv()` will be used. - Read a single file - Read all files in a directory, something like this:...

tests

bug: arrow type error when show data with 'UUID' object

### What happened? Issue 1: duckdb will produce different uuid for each row, but same uuid generated by sqlite, there maybe other backends have the same issue. ```python import ibis...

bug

feat: optimize table info() and describe() for large column tables efficiently

### Is your feature request related to a problem? we have [table.info](http://table.info/)() and` table.describe()` for Ibis table. The function loops over each column and performs multiple aggregations and form the...

feature

performance

bug: cannot convert `y` to numpy on kaggle notebook in sklearn pipeline

In this [competition](https://www.kaggle.com/code/jiting/intro-ibisml), `y` column cannot be converted to numpy array. ~~I could run this on my local machine, but not on kaggle notebook.~~ ～～**I could reproduce this on my...