Results 33 comments of Ivan Longin

We had a lot of discussions about this through the history of dvcx / datachain. Basically it boils down to this: 1. Preserving order by making sure inserted rows are...

We need to think how to deal with additional tables that are created during indexing, like buckets or partials. So this is not just normal UDF that has an output...

Partials are needed to be able to index part of a bucket and to avoid re-indexing subdirectories. I have a feeling though that this can all be done even without...

I can take over this one and make a plan / subtasks

> @ilongin should we actually rewrite source if source is True? 🤔 since the existing source is kinda wrong by now ... Make sense, I've overwritten it now with new...

Another idea: maybe use `studio/mycats` instead of just `/mycats` ? ... this is more verbose but more clear and similar to git branches naming convention where we have `origin/mycats`. Having...

> [@ilongin](https://github.com/ilongin) that's good idea but we need to keep in mind that we will need to introduce org/team in the future like `myorg/[email protected]` andm I'm thinking about empty org...

@shcheklein @dmpetrov Question about `datachain pull` -> currently we can set optional **local** dataset name / version to which studio dataset will be pulled. I'm wondering if we should remove...

@shcheklein I spoke with Dmitry and we decided that `diff` should be like this: https://github.com/iterative/datachain/issues/636 ... I will start working on it soon.

`DatasetQuery` is currently much more than just running DB queries so we would need to remove most of the logic there. My question is why do we even need 2...