substrate icon indicating copy to clipboard operation
substrate copied to clipboard

Improvements to runtime api versioning

Open tdimitrov opened this issue 3 years ago • 7 comments

The problem

Runtime API versioning was designed and built around the concept of extending a stable runtime API with testing (aka staging) methods, which are selectively enabled in different implementations. It serves its purpose well but we hit a corner case with this PR. The discussion is long so I will resume it in the next paragraph.

We have got the following API. It is at version 2 and has got two methods.

   	#[api_version(2)]
   	pub trait Api {
		fn some_method(data: u64);
   		fn session_data();
	}

Next we want to modify the signature of session_data. One way to do it is like this:

   	#[api_version(3)]
   	pub trait Api {
		fn some_method(data: u64);
    
   		#changed_in(2)
   		fn session_data();

   		fn session_data() -> u64;
	}

This will work, we will have got v2 with the old signature and v3 with the new one. There is a catch though - we are changing the base version of the API to v3 and we no longer has got the option to implement v2 for some networks. This not acceptable if the function is considered testing/staging/etc.

Solution

variant A

It would be nice to have a syntax like this. Lets call this syntax A:

   	#[api_version(2)]
   	pub trait Api {
		fn some_method(data: u64);
   		fn session_data();

                #[api_version(3)]
   		fn session_data() -> u64;
	}

This way we will have the possibility to implement two version of the API. Version 2 which effectively is:

   	pub trait ApiV2 {
		fn some_method(data: u64);
   		fn session_data();
	}

And version 3:

   	pub trait ApiV3 {
		fn some_method(data: u64);
   		fn session_data() -> u64;
	}

And we will be able to implement both versions on different runtimes selectively.

variant B

An alternative is to have got syntax B:

   	#[api_version(2)]
   	pub trait Api {
		fn some_method(data: u64);

   		#[changed_in(3)
   		fn session_data();

                #[api_version(3)]
   		fn session_data() -> u64;
	}

It is more consistent with the first approach I outlined, but maybe a bit less intuitive.

Complications

new runtime old client

The main complication I see is handling new runtime with old client scenario. If a client knows only about version 2 but the runtime is at version 3 - the runtime call needs to be dispatched to the correct function.

The problem is further multiplied by having multiple versions. E.g.

   	#[api_version(2)]
   	pub trait Api {
		fn some_method(data: u64);

   		#[changed_in(3)
   		fn session_data();

                #[changed_in(4)
   		fn session_data()  -> u64;

                #[api_version(4)]
   		fn session_data() -> u128;
	}

The code above will have to generate session_data(), session_data__before_v3(), session_data__before_v4() and etc. Dispatching in such scenarios will be a small nightmare.

tdimitrov avatar Jan 13 '23 14:01 tdimitrov

If a client knows only about version 2 but the runtime is at version 3 - the runtime call needs to be dispatched to the correct function.

This does not work. Nodes always need to be updated before the runtime upgrade happens, otherwise old nodes are not working anymore. We don't have any dispatching logic in the runtime for this, this all happens on the node. The runtime always only has one version of the function defined.

Let me propose some changes to variant A, let me call it variant Basti

variant Basti

   	#[api_version(2)]
   	pub trait Api {
		fn some_method(data: u64);
                #[changed_in(3, session_data_oldy_old)]
   		fn session_data();

                #[api_version(3)]
   		fn session_data() -> u64;
	}

AFAIR we already generate multiple traits per version since your changes to the macro. I think we should continue this. So the code above generates the following:

mod for_runtime {
   	pub trait ApiV2 {
		fn some_method(data: u64);
   		fn session_data();
	}

   	pub trait ApiV3 {
		fn some_method(data: u64);
   		fn session_data() -> u64;
	}
}

mod for_node {
        pub trait Api {
		fn some_method(data: u64);
		fn session_data_oldy_old();
   		fn session_data() -> u64;
	}
}

As before we only implement one of the traits for the runtime, depending on the version.

With #[changed_in(3, session_data_oldy_old)] I would try to fix up my fault of doing the naming automatically. As users are never able to find this name when they search for it otherwise. However, someone could also argue that we should not rename any old method at all and instead give the new method a new name and put a deprecated over the old to inform people. While my last sentence probably is the most "sane" variant.

bkchr avatar Jan 13 '23 15:01 bkchr

This does not work. Nodes always need to be updated before the runtime upgrade happens, otherwise old nodes are not working anymore. We don't have any dispatching logic in the runtime for this, this all happens on the node. The runtime always only has one version of the function defined.

I didn't know that nodes must be updated before the runtime. And I haven't realised that the dispatching code is on the client, which explains this requirement. Thank you for that clarification.

'Variant Basti' looks good but is it really a good idea to let the user pick names for the old functions? One can put the same name for two completely different methods or who knows what else. And the benefit will be minimal. I think it's beneficial to have this autogenerated. Maybe we can make it a bit more easier to use by appending the version to the old methods? E.g.:

mod for_node {
        pub trait Api {
		fn some_method(data: u64);
		fn session_data_v2();
   		fn session_data() -> u64;
	}
}

tdimitrov avatar Jan 13 '23 18:01 tdimitrov

One can put the same name for two completely different methods or who knows what else.

But that is something that we can detect in the macro. I mean I know where this comes from and can live with it. I just thought that it would be easier for users, but I also don't care that much. If you tell me the current naming is okay, we can keep it :)

bkchr avatar Jan 13 '23 20:01 bkchr

Recalling how much time I spent figuring out where session_info_before_version_2 comes from, I'd say that the explicit naming is a good idea ;) At least it could be grepped in the code.

s0me0ne-unkn0wn avatar Jan 14 '23 11:01 s0me0ne-unkn0wn

There is a catch though - we are changing the base version of the API to v3 and we no longer has got the option to implement v2 for some networks.

If some networks still want V2, why not just make two different API traits? I know it is the least elegant, but it might actually be the most worthwhile?

kianenigma avatar Jan 15 '23 18:01 kianenigma

There is a catch though - we are changing the base version of the API to v3 and we no longer has got the option to implement v2 for some networks.

If some networks still want V2, why not just make two different API traits? I know it is the least elegant, but it might actually be the most worthwhile?

For the above mentioned PR we can safely switch to v3. It's highly unlikely to do any changes to v2 anymore.

However I want to fix the general case. The main motivation for the runtime api versioning was to be able to develop 'staging' methods and test them without too much hassle. Creating a separate API each time when a function signature needs to be changed is an effort. And more importantly it will make testing a bit harder. For example without the versioning testing runtime upgrades won't be possible.

tdimitrov avatar Jan 16 '23 09:01 tdimitrov

I'm also in favor of a clean solution than creating multiple api traits :P

bkchr avatar Jan 16 '23 14:01 bkchr