volo icon indicating copy to clipboard operation
volo copied to clipboard

Add an option to split generated pilota files

Open missingdays opened this issue 1 year ago • 2 comments

Fixes #454

missingdays avatar Oct 18 '24 09:10 missingdays

@Millione @PureWhiteWu Hi. I couldn't find tests for the related builders, could you please point me to one if they exist? Otherwise, I changed one example to use the new split_generated_files option. It compiles and runs. I also verified that the files are actually split and that the RustRover IDE still navigates to the symbols in the generated split files.

missingdays avatar Oct 18 '24 09:10 missingdays

Perhaps you can add one integration test in the tests folder for workspace mode?The builder is here https://github.com/cloudwego/volo/blob/main/volo-build/src/workspace.rs#L11

Millione avatar Oct 18 '24 09:10 Millione

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Oct 28 '24 12:10 CLAassistant

@missingdays Hi, are things going well here?

Millione avatar Oct 30 '24 11:10 Millione

@Millione Hi, I'm waiting for the reply here - https://github.com/cloudwego/volo/pull/510#discussion_r1819020976

missingdays avatar Oct 30 '24 11:10 missingdays

Sorry for that, I forgot to say that can you check out why the generated code fails to compile in split mode? It seems related to the split mode implementation in pilota-build

Millione avatar Oct 30 '24 11:10 Millione

@Millione As I mentioned, Windows cannot handle files that only differ by case. For testService and TestService, the generated files are considered the same, so the compilation fails.

missingdays avatar Oct 30 '24 11:10 missingdays

Maybe we can change the filename that only differ by case a little bit? For testService and TestService, it can be TestService1 and TestService2 (just for example)

Millione avatar Oct 30 '24 11:10 Millione

We can do that, I'll try to implement it

missingdays avatar Oct 30 '24 11:10 missingdays

@Millione Please see https://github.com/cloudwego/pilota/pull/280. I added a test as well

missingdays avatar Oct 30 '24 12:10 missingdays

@missingdays https://github.com/cloudwego/pilota/pull/280 gets merged, can you check out whether it works fine for here?

Here can we split it in files?

There is one more question left. There are some structs generated in one file because of volo-build, I think we should split it either.

Millione avatar Oct 31 '24 16:10 Millione

@Millione

gets merged, can you check out whether it works fine for here?

The tests pass now

There are some structs generated in one file because of volo-build

In impl pilota_build::CodegenBackend for VoloThriftBackend, there's no API that would allow splitting the stream into multiple files, since there's no access to the directory where the additional files should be placed. We could change the API in a couple of ways:

  1. Provide a base directory in which additional files can be placed. This will solve this specific problem, but it's a bit specific I think
  2. Instead of passing stream: &mut String, pass stream: PilotaStreamBuilder, which would have methods such as push_str, start_new_file_generation(name), finish_file_generation. So it could be used like
stream.start_new_file_generation("ServiceResponseRecv.rs");
stream.push_str(...); // push the impl of ServiceResponseRecv
stream.finish_file_generation();

This way, PilotaStreamBuilder can always be extended with new methods which will allow more precise generation without changing CodegenBackend each time.

  1. Provide the base directory in Context which is accessible from CodegenBackend

missingdays avatar Nov 04 '24 09:11 missingdays

@missingdays Great thoughts. Point 2 and 3 look good to me, I think you can go for it.

Millione avatar Nov 04 '24 09:11 Millione

@Millione After exploring the code a bit more, I realized that we could use Context#mode + Context#item_path to get all the required information to locate the files. But mode is pub(crate) and so cannot be accessed from volo. I sent an mr to make it pub - https://github.com/cloudwego/pilota/pull/286

missingdays avatar Nov 05 '24 15:11 missingdays

@Millione Hi. Splitting service_ImageService requires more work than I originally anticipated, basically re-writing all the logic from pilota again in volo. Considering all the other splitting, I'd think if it's not split, it will still not get too large to cause any problems, wdyt?

missingdays avatar Nov 07 '24 10:11 missingdays

@missingdays I think it will still be too large and leaving it unsplit looks inconsistent to the user. Is the workload heavy? Can you elaborate on what you're going to do? We can work on it together to find out the best solution.

Millione avatar Nov 08 '24 07:11 Millione

@Millione you are right, it would be inconsistent. Is the service name deduplication needed here as well? Can there be 2 services named articleService and ArticleService in the rpc_article/src/article module for example?

missingdays avatar Nov 09 '24 18:11 missingdays

@missingdays Yes, I think it is needed here either.

Millione avatar Nov 10 '24 10:11 Millione

@Millione I added the functionality to split the services generated by volo into several files. Let me know if this splitting is enough or if we should split it into more files.

As for the name deduplication, I didn't find a good way to do it. What would be the correct way to get all the needed service names so that we could check if the service name already exists?

missingdays avatar Nov 12 '24 13:11 missingdays

There are also so more changes to pilota required, so I'm using a custom branch fix/context-mode-pub from my repo https://github.com/missingdays/pilota. We can make all the changes there and then merge them all together before merging this MR, to avoid a bunch of small MRs to pilota.

missingdays avatar Nov 12 '24 13:11 missingdays

@missingdays

I added the functionality to split the services generated by volo into several files. Let me know if this splitting is enough or if we should split it into more files.

I think we can split it into more files, commented in the code specifically.

What would be the correct way to get all the needed service names so that we could check if the service name already exists?

As for this, you can refer to cx.names(def_id), if its name duplicates, it will return true.

Millione avatar Nov 13 '24 07:11 Millione

As for this, you can refer to cx.names(def_id), if its name duplicates, it will return true.

The deduplication has to check all the names, not just the current one. Because we need to transform all names to the lowercase and only then check for duplicates. For example, service ArticleService and service articleService will have different def_id, but their related directory names would clash.

missingdays avatar Nov 13 '24 09:11 missingdays

The deduplication has to check all the names, not just the current one. Because we need to transform all names to the lowercase and only then check for duplicates.

What's behind cx.names(def_id) is that we've already converted all the names to the rust style to check for duplicates, and stored all the corresponding def_ids in a HashSet. I think this will work as you describe?

but their related directory names would clash.

I haven't understood it well for how their related directory names affect this, can you explain it more? As far as I'm concerned, the cx.names(def_id) for service ArticleService and service articleService will both return true, and then we know we should make a change for the filename.

Millione avatar Nov 14 '24 04:11 Millione

Thanks, I see your point. However, we need to know how to deduplicate. If for both ArticleService and articleService we get true, how to know for which service to generate a postfix _1 and for which _2? If the ordered list of all names was available, we could check if ArticleService or articleService is first in that list and assign it the _1 postfix, and the other _2.

missingdays avatar Nov 14 '24 10:11 missingdays

Oh, I get it.

If for both ArticleService and articleService we get true, how to know for which service to generate a postfix _1 and for which _2?

I have thought about using a global AtomicU64, but maybe not a best way to solve this (the postfix maybe different every time generated and influences the IDE cache)

If the ordered list of all names was available, we could check if ArticleService or articleService is first in that list and assign it the _1 postfix, and the other _2

Great, I think you can put any information you need into cx, and the ordered list of all names information I think you can get from https://github.com/cloudwego/pilota/blob/main/pilota-build/src/middle/context.rs#L382.

Millione avatar Nov 14 '24 14:11 Millione

@missingdays Any help needed here? I can help implementing the rest if you don't have enough time.

Millione avatar Nov 18 '24 08:11 Millione

@Millione Yeah sorry, I've been busy most of the last week. I think only deduplication of the folders is left to be implemented? If you could implement it, that would be great. I added you as a collaborator to both pilota and volo forks, feel free to push any changes to the fix/context-mode-pub branch if necessary.

missingdays avatar Nov 18 '24 08:11 missingdays

@missingdays

I think only deduplication of the folders is left to be implemented?

Yes, I just implement it, please take a look if there's something I'm missing. Futhurmore, for the rest of job, I think there is only some detailed code style to change.

Millione avatar Nov 20 '24 14:11 Millione

@Millione Nice!

I think there is only some detailed code style to change.

I'm ready to fix them if you have any comments.

missingdays avatar Nov 20 '24 14:11 missingdays

@missingdays Well done! I think it's ready to merge. May you submit a PR to pilota first?

Millione avatar Nov 22 '24 06:11 Millione