Han Wang issues

Results 43 issues of


                                            Han Wang

mmlspark.lightgbm._LightGBMClassifier does not exist

**Describe the bug** mmlspark.lightgbm._LightGBMClassifier does not exist **To Reproduce** I git cloned the repo and sys.path.append the mmlspark python path, `import mmlspark` has no issue, but the classifier inside can't...

awaiting response

by design

area/lightgbm

area/documentation

installation

[FEATURE] Spark Take 1 row without sorting optimization

**Is your feature request related to a problem? Please describe.** Look at [here](https://github.com/fugue-project/fugue/blob/838fdaa794c62e8bdc7f1474818d9491d5d39ed7/fugue_spark/execution_engine.py#L513) If taking just one row with our sorting, we may use `GROUP BY` and `FIRST` to solve...

enhancement

good first issue

spark

core feature

low priority

[FEATURE] Migrate to plugin mode

Triad already has the standard solution for plugin mode: https://github.com/fugue-project/triad/pull/85 We need to migrate all Fugue plugins to this standard approach. We also need to keep the old way working....

behavior change

refactoring

`SAVE AND USE' should work for CSV.

_Originally posted by @keiranmraine in https://github.com/fugue-project/fugue/issues/331#issuecomment-1160126031_ ```console File c:\Users\XXX\.venv\lib\site-packages\fugue\workflow\workflow.py:1518, in FugueWorkflow.run(self, *args, **kwargs) 1516 if ctb is None: # pragma: no cover 1517 raise -> 1518 raise ex.with_traceback(ctb) 1519 self._computed...

bug

[FEATURE] Normalize names when reading files

Many CSV files contain column names with special characters. Fugue will raise exceptions because it has more strict [rules](https://github.com/fugue-project/triad/blob/00e395a33fb09b4bb5b1ce9dbf168c3a14b8b474/triad/utils/string.py#L13) for column names. So we should have an option when reading...

enhancement

behavior change

core feature

[FEATURE] Unify IO interfaces

Currently, each execution engine has its own separate implementation for loading files. This is messy with a lot of duplications. We should create a unified IO engine just like SQLEngine...

enhancement

refactoring

core feature

[FEATURE] Processor should take wildcard schema

**Describe the solution you'd like** ```python #schema *,a:int def t(df:pd.DataFrame) -> pd.DataFrame: # do something and return ``` Currently this can only be used on transformer, but it should be...

enhancement

programming interface

core feature

[NOTICE] Different SQL behaviors between Spark SQL and Sqlite on HAVING

These two SQLs have different behavior on HAVING ```sql CREATE [[1, 2], [NULL, 2], [NULL, 1], [3, 4], [NULL, 4]] SCHEMA a:double,b:int SELECT a, SUM(b) AS b GROUP BY a...

Fugue SQL

notice

[FEATURE] Change to furo theme

**Describe the solution you'd like** Change the documents to Furo theme

documentation

enhancement

[FEATURE] Separate lazy implementation of Dask reading Avro

**Describe the solution you'd like** Currently, for avro IO, dask is still using the local version implementations, we should use [this](https://docs.dask.org/en/latest/_modules/dask/bag/avro.html) instead to utilize the distributed system

enhancement

dask