mars
mars copied to clipboard
[BUG] raydataset.to_ray_dataset has type error
Describe the bug Got an runtime error when using raydataset.to_ray_dataset
To Reproduce To help us reproducing this bug, please provide information below:
- Your Python version
3.7.7 from ray 1.9 docker
- The version of Mars you use
0.8.1
- Versions of crucial packages, such as numpy, scipy and pandas
- Full stack of the error.
022-02-13 22:41:24,618 INFO services.py:1340 -- View the Ray dashboard at http://127.0.0.1:8265
Web service started at http://0.0.0.0:42881
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100.0/100 [00:00<00:00, 1044.45it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100.0/100 [00:00<00:00, 347.11it/s]
a 508.224206
b 493.014249
c 501.428825
d 474.742166
dtype: float64
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100.0/100 [00:05<00:00, 17.29it/s]
a b c d
count 1000.000000 1000.000000 1000.000000 1000.000000
mean 0.508224 0.493014 0.501429 0.474742
std 0.279655 0.286950 0.293181 0.288133
min 0.000215 0.000065 0.000778 0.001233
25% 0.271333 0.238045 0.249944 0.224812
50% 0.516350 0.498089 0.503308 0.459224
75% 0.747174 0.730087 0.750066 0.716232
max 0.999077 0.999674 0.999869 0.999647
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100.0/100 [00:00<00:00, 1155.09it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100.0/100 [00:00<00:00, 712.39it/s]
Traceback (most recent call last):
File "distributed_mars.py", line 19, in
- Minimized code to reproduce the error.
import ray
ray.init()
import mars
import mars.tensor as mt
import mars.dataframe as md
from mars.dataframe.contrib.raydataset import to_ray_dataset
session = mars.new_ray_session(worker_num=2, worker_mem=1 * 1024 ** 3)
mt.random.RandomState(0).rand(1000, 5).sum().execute()
df = md.DataFrame(
mt.random.rand(1000, 4, chunk_size=500),
columns=list('abcd'))
df.extra_params.raw_chunk_size = 500
print(df.sum().execute())
print(df.describe().execute())
# Convert mars dataframe to ray dataset
df.execute()
ds = to_ray_dataset(df, num_shards=4)
print(ds.schema(), ds.count())
ds.filter(lambda row: row["a"] > 0.5).show(5)
# Convert ray dataset to mars dataframe
df2 = md.read_ray_dataset(ds)
print(df2.head(5).execute())
Expected behavior
A type error error caused by to_ray_dataset. " TypeError: Calling 'put' on an ray.ObjectRef is not allowed (similarly, returning an ray.ObjectRef from a remote function is not allowed). If you really want to do this, you can wrap the ray.ObjectRef in a list and call 'put' on it (or return it)."
Additional context Add any other context about the problem here.
@jyizheng Which mars version are you using? I tested your script with mars master(df1492c4974bb027e7916ae116b12a340e3abdea) and it passed without error. My test setup is macos 2018 and python 3.8, and ray 1.9.2
I reproduced the error using pymars==0.8.1 and ray==1.9.2 on may mac, will ping you after I located the issue @jyizheng
@jyizheng Seems it's a bug in old version of mars, which has been fixed in #2579. Could you try a newer mars version?
@jyizheng Could you try use mars 0.9 by pip install --pre pymars, I believe it will works.
If it's verified it's fixed in 0.9 branch, we can backport the pr then.
Thanks for the reply. I updated mars version to 0.9.0a2. The problem of to_ray_dataset is gone. A new problem is that read_ray_dataset does not exist any more. I am wondering what is the new API to transform a raydataset to dataframe.
I read the code, it's read_raydataset, can you confirm that @chaokunyang ?
IMO, it's indeed weird to see that to_ray_dataset and read_raydataset are not named in pairs.
@qinxuye I don't know why it's named as read_raydataset instead of read_ray_dataset, I changed it to read_ray_dataset in #2705 yesterday.
@qinxuye I don't know why it's named as
read_raydatasetinstead ofread_ray_dataset, I changed it toread_ray_datasetin #2705 yesterday.
OK, read_raydataset can be kept for compatibility but with deprecation warning.
I changed df2 = md.read_ray_dataset(ds) to df2 = md.read_raydataset(ds).
The code I wrote can run to completion. However, there was an exception at the end.
Exception ignored in: <function _TileableSession.init.
Looks like it's completed, but some coroutine tries to be scheduled, but the event loop is closed already, again, it does not effect the result as well.
Both original code and read_raydataset variant not reproduced with current master branch.