datachain Remove `name` and `version` properties from `DataChain`

Currently we have name and version properties in DataChain class which are not needed as we already have dataset property which points to underlying dataset if one exists. Also, in addition we should think about namespace_name and project_name as well which are also properties. Probably we should remove those and think about exposing settings in general.

Aug 19 '25 11:08 ilongin

namespace_name and project_name

Please combine these two to a single namespace that contains both.

Aug 19 '25 18:08 dmpetrov

namespace_name and project_name

Please combine these two to a single namespace that contains both.

I would avoid doing this atm. All around the code we have this split into two so if we want to have only namespace that contains both we should do one bigger refactoring to be consistent everywhere.

Aug 20 '25 14:08 ilongin

I think Dmitry's point was to do changes on the public API level. What would be the scope for this?

Aug 20 '25 15:08 shcheklein

I think Dmitry's point was to do changes on the public API level. What would be the scope for this?

It should not be a problem to do that, will do it in this issue. Also, I will create a follow-up to use namespace.project.dataset naming convention everywhere in our codebase where we use dataset name (internal and external APIs) to avoid namespace_name and project_name arguments in a lot of functions.

Aug 21 '25 14:08 ilongin

Can you scope it please before you do it?

Aug 21 '25 14:08 shcheklein

Can you scope it please before you do it?

From my short investigation in datachain we need to change:

lib.dc.datasets.read_dataset()
lib.dc.datasets.delete_dataset()
lib.dc.datachain.DataChain.settings()

We need to check Studio usage of those and change them as well if needed.

Overall I think it's not a big task, but we do change public API - that's the biggest issue here

Aug 22 '25 12:08 ilongin

How about env variables that they use now? What else are we missing? can we carefully grep and think what exactly we'll break?

how will the new API look like exactly (env vars, settings, read_dataset) - can you please describe it here ?

can we make it with some deprication to give some time to migrate first and then drop support later?

please scope it e2e with a proper plan and ETA

Aug 22 '25 16:08 shcheklein

New API will look the same for now, except that namespace argument we will now cover both namespace and project .. in future, after some deprecation time it would go from

def read_dataset(
    name: str,
    namespace: Optional[str] = None,
    project: Optional[str] = None,
    ...
)

to

def read_dataset(
    name: str,
    namespace: Optional[str] = None,
    ...
)

These are the options to call this method:

dc.read_dataset("cats")  # default namespace / project is used
dc.read_dataset("cats", namespace="dev.animals")  # "dev" namespace and "animals" project is used

We need to decide if it makes sense to allow something like this:

dc.read_dataset("cats", namespace="dev")  # namespace "dev" and default project (?) - default project exists only in default namespace so this is not clear.
dc.read_dataset("cats", namespace=".animals")  # project "animals" in default namespace - this makes more sense than above example

Regarding env variables, only DATACHAIN_NAMESPACE would be enough.

All should be depreciated and backward compatible.

Aug 28 '25 13:08 ilongin