Han Wang
Han Wang
**Describe the bug** mmlspark.lightgbm._LightGBMClassifier does not exist **To Reproduce** I git cloned the repo and sys.path.append the mmlspark python path, `import mmlspark` has no issue, but the classifier inside can't...
**Is your feature request related to a problem? Please describe.** Look at [here](https://github.com/fugue-project/fugue/blob/838fdaa794c62e8bdc7f1474818d9491d5d39ed7/fugue_spark/execution_engine.py#L513) If taking just one row with our sorting, we may use `GROUP BY` and `FIRST` to solve...
Triad already has the standard solution for plugin mode: https://github.com/fugue-project/triad/pull/85 We need to migrate all Fugue plugins to this standard approach. We also need to keep the old way working....
_Originally posted by @keiranmraine in https://github.com/fugue-project/fugue/issues/331#issuecomment-1160126031_ ```console File c:\Users\XXX\.venv\lib\site-packages\fugue\workflow\workflow.py:1518, in FugueWorkflow.run(self, *args, **kwargs) 1516 if ctb is None: # pragma: no cover 1517 raise -> 1518 raise ex.with_traceback(ctb) 1519 self._computed...
Many CSV files contain column names with special characters. Fugue will raise exceptions because it has more strict [rules](https://github.com/fugue-project/triad/blob/00e395a33fb09b4bb5b1ce9dbf168c3a14b8b474/triad/utils/string.py#L13) for column names. So we should have an option when reading...
Currently, each execution engine has its own separate implementation for loading files. This is messy with a lot of duplications. We should create a unified IO engine just like SQLEngine...
**Describe the solution you'd like** ```python #schema *,a:int def t(df:pd.DataFrame) -> pd.DataFrame: # do something and return ``` Currently this can only be used on transformer, but it should be...
These two SQLs have different behavior on HAVING ```sql CREATE [[1, 2], [NULL, 2], [NULL, 1], [3, 4], [NULL, 4]] SCHEMA a:double,b:int SELECT a, SUM(b) AS b GROUP BY a...
**Describe the solution you'd like** Change the documents to Furo theme
**Describe the solution you'd like** Currently, for avro IO, dask is still using the local version implementations, we should use [this](https://docs.dask.org/en/latest/_modules/dask/bag/avro.html) instead to utilize the distributed system