datafusion icon indicating copy to clipboard operation
datafusion copied to clipboard

WIP: Generate docs from macros.

Open comphead opened this issue 1 year ago • 9 comments

Which issue does this PR close?

Closes #.

Rationale for this change

What changes are included in this PR?

Are these changes tested?

Are there any user-facing changes?

comphead avatar Oct 08 '24 21:10 comphead

@Omega359 @alamb I tried to play with custom attributes to wrap up the documentation on top of the what @Omega359 already built. I'm experimenting with just 2 fields(description and examples) and the attribute has to be placed on top of the struct.

comphead avatar Oct 08 '24 21:10 comphead

WDYT should we move in this direction?

comphead avatar Oct 08 '24 21:10 comphead

I find this intriguing and technically very interesting! I do have a few questions:

  • What are you goals with this? I am assuming cleaner looking/easier to create docs. If so I would wonder how it would help in cases were there is significant amounts of docs vs boilerplate such as in the to_date udf.
  • Can it be made to handle n arguments?
  • Is it possible to generate rustdoc with this approach?
  • I noticed that this approach creates a Documentation object on every call. It's obviously pretty minor but it would be nice if it could be static.

To me in general while I find this quite interesting I wonder if the benefit outweighs the added complexity of the code generation. To be clear I am absolutely not against this approach - I just have a few concerns.

Omega359 avatar Oct 09 '24 13:10 Omega359

I find this intriguing and technically very interesting! I do have a few questions:

  • What are you goals with this? I am assuming cleaner looking/easier to create docs. If so I would wonder how it would help in cases were there is significant amounts of docs vs boilerplate such as in the to_date udf.
  • Can it be made to handle n arguments?
  • Is it possible to generate rustdoc with this approach?
  • I noticed that this approach creates a Documentation object on every call. It's obviously pretty minor but it would be nice if it could be static.

To me in general while I find this quite interesting I wonder if the benefit outweighs the added complexity of the code generation. To be clear I am absolutely not against this approach - I just have a few concerns.

Thanks @Omega359 The idea is to reuse your current approach but make it on the level of custom Rust attribute instead of having the documentation directly in the code. So the steps can look like:

  • Move Documentation structure to pre-macros crate
  • generate the code from macros instead of having it statically
  • we can have it static everything will be the same as its now, the only difference is the code will be created through the macros.

I'm planning to play with to_date function as it covers also multiline comments as @alamb mentioned

comphead avatar Oct 09 '24 14:10 comphead

@alamb @Omega359 please have a look on real example to_date (I still need to include argements to be called with the builder). As you can see it is the same approach as before, the only difference the documentation is set by attributes rather than code, but it generates documentation builders as before.

Btw multiline seems to be working

comphead avatar Oct 10 '24 18:10 comphead

Thanks @alamb I'll consider the changes, @Omega359 is it good to have a documentation method on struct level instead of impl?

comphead avatar Oct 11 '24 16:10 comphead

@Omega359 is it good to have a documentation method on struct level instead of impl?

I can't think of a reason why not off the top of my head. I'm hoping to have time to look over your latest updates in this PR later tonight

Omega359 avatar Oct 11 '24 17:10 Omega359

Thanks @Omega359 there is bunch of things needed to be added, its not a final PR. I'll add some minor missing pieces and the PR will be ready for the review, most likely next week. Because I need to make a cleanup, remove test structures, polish crates and everything.

comphead avatar Oct 11 '24 19:10 comphead

I'll get back on it little later as need to finish some more urgent work with joins

comphead avatar Oct 18 '24 19:10 comphead

I'm closer to this

comphead avatar Nov 12 '24 01:11 comphead

We should also take care on handing things like https://github.com/apache/datafusion/pull/13367

comphead avatar Nov 12 '24 01:11 comphead

@alamb @Omega359 the PR is ready for review

comphead avatar Nov 18 '24 19:11 comphead

Done,

Maybe we could call the function something like macro_doc or derived_documentation to make it easier to find (aka so someone who runs across it can easily grep and find the macro definition)

Good idea, maybe we can name it as user_documentation. I'd prefer to leave it to the following PR though, because it will affect dozens of files

comphead avatar Nov 18 '24 22:11 comphead

Another thing comes to my mind, we can automate transforming current API documentation approach to attribute based one wherever possible. There are more complicated options with macros, we can leave them with API approach for now

comphead avatar Nov 18 '24 22:11 comphead

Let's wait to see what @Omega359 has to say too, but I would be inclined to merge it in (as it is an optional feature) once we have a bit more documentation

I'll hopefully be able to take a good look at this tonight 🤞

Omega359 avatar Nov 19 '24 15:11 Omega359

Sorry, I've been busy lately. I'm hoping to get to reviewing this PR this week

Omega359 avatar Nov 21 '24 17:11 Omega359

Thanks @Omega359 I will be adding the udfs and alternate syntax in follow up PR. I'll find the UDFs supporting both related_udfs and alternative_syntax and make these changes simultaneously along with some other things. Preferring the follow up PR as this one is already too large to review

comphead avatar Nov 23 '24 21:11 comphead

@Omega359 I'm planning to merge it today, getting more code conflicts, let me know if any objections

comphead avatar Nov 24 '24 17:11 comphead

I'm fine with that @comphead - we can file followup issues to add the alternative syntax, etc

Omega359 avatar Nov 24 '24 17:11 Omega359

Filed https://github.com/apache/datafusion/issues/13553 to add missing parts

comphead avatar Nov 24 '24 20:11 comphead