Results 33 comments of Ivan Longin

@dmpetrov can you show me example rows of that `example.parquet` if possible? I'm currently unable to reproduce and it looks like it might be related to specific data in that...

@shcheklein you are right, that's exactly what was happening. I've already created a fix PR.

Some times for pulling without instantiating from Studio production (team name: `demo-1`): 1. `ds://laion_wds_1m` (1M objects, 14 custom signals) : ~6k rows/sec 2. `ds://laion_wds` (11.5M objects, 14 custom signals) :...

@dmpetrov I assume you want to copy local file to cloud bucket, but this is not something we support in `cp` at the moment. It only allows copying files from...

> > dc.read_dataset() -> dc.datasets.read() > > Yes, but top level `dc.read_dataset()` is needed. > Yea, we would def need to leave current top level methods, at least for backward...

> A single checkpoint approach seems enough. Changing code and continuing from the previous checkpoint is the default behavior (until full rerun). Should UDF function be included in that checkpoint...

> Why does it fix the issue, could you explain it please? > > Specifically, I don't quite understand why would columns we have in `select` / `subquery` affect the...

> > Schema is derived directly from selected columns from built SQLAlchemy query. > > could you point me to it please? 1. Columns created out of query and sent...

> that's really weird, why aren't we using signal schema? it feels it can be tricky to preserve and guarantee order of columns in all these subqueries and selects ......

> > `namespace_name` and `project_name` > > Please combine these two to a single `namespace` that contains both. I would avoid doing this atm. All around the code we have...