mars
mars copied to clipboard
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
**Describe the bug** A clear and concise description of what the bug is. **To Reproduce** To help us reproducing this bug, please provide information below: 1. Your Python version 2....
**Describe the bug** When using mars web client, if we cancel a task, then executing a new task ,we got `Connection refused.` **To Reproduce** To help us reproducing this bug,...
Mars storage data key error when autoscaling in workers: ``` 2022-02-24 17:04:33,625 ERROR autoscale.py:343 -- Exception occurred when try to auto scale Traceback (most recent call last): File "/home/admin/ray-pack/tmp/job/99000080/pyenv/lib/python3.7/site-packages/mars/services/scheduling/supervisor/autoscale.py", line...
Right now the documentation of Mars does not list and compare the Dataframe and Tensor with the Pandas Dataframe and Numpy API. It will be great if we can compare...
**Is your feature request related to a problem? Please describe.** Implements LogisticRegression for Logistic Regression (aka logit, MaxEnt) classifier: API Like:https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html the param i need: * **penalty** {‘l1’, ‘l2’, ‘elasticnet’,...
Now we implements GroupBy.nunique using GroupBy.transform, some optimizations could be applied to reduce intermediate data size.
# Reporting a bug ``` import pandas as pd import mars import numpy as np df = pd.DataFrame(np.random.rand(5,3)) sliced_df = df.loc[0:1] # Out[6]: sliced_df # 0 1 2 # 0...
We can add a doc about performance tuning for lazy evaluation. 1. Pay attention to common data, if it's not executed in advance, it may be executed for multiple times....
**Is your feature request related to a problem? Please describe.** For now, I cannot use md.loc for setting values. For example, I tried df.loc[:, 'column_name'] = function(df). Then the "TypeError:...
Current implementation could be found here #2466. In fact, iteratively calling stochastic gradient descent is quite inefficient for distributed frameworks like Mars. Potential solutions could be: 1. Zhuang, Yong, et...