[EPIC] Automatically generate all function content from code
Is your feature request related to a problem or challenge?
Currently, when we add a new function to DataFusion library we have to remember to document that function in the documentation, but currently we have to remember this during code review.
This has a few downsides:
- we have forgotten to document some functions
- the documentation has drifted over time,
- help text for various functions can only be found on the DataFusion website, and not, for example within the function itself.
@Omega359 has created the basic framework for automatically generating content in https://github.com/apache/datafusion/pull/12668
This ticket tracks the work required to port the rest of the documentation to programatic form
Tasks
- [x] https://github.com/apache/datafusion/issues/12432
- [ ] https://github.com/apache/datafusion/pull/12743
- [x] https://github.com/apache/datafusion/pull/12745
- [ ] Migrate all existing UDF's to the new format (see below)
- [ ] https://github.com/apache/datafusion/issues/12741
- [ ] Add support for
DESCRIBE xyz; where xyz is a function. Ideally this would support not just the core functions in DF but also any added externally (for example those in https://github.com/datafusion-contrib/datafusion-functions-extra) and verify it works in the CLI - [ ] Look into generating the function sections in expressions.md (it's a bit out of date)
- [ ] Make generate_function_docs.sh error if there is a function without documentation
Aggregate Function Documentation Migration
- [x] #12827
- [x] Port / Add
Documentationfor BitwiseOperation - [x] Port / Add
Documentationfor VarianceSample https://github.com/apache/datafusion/pull/12742 - [x] Port / Add
Documentationfor VariancePopulation https://github.com/apache/datafusion/pull/12742
Scalar Function Documentation Migration
- [ ] https://github.com/apache/datafusion/issues/11172
- [x] https://github.com/apache/datafusion/pull/12775
- [x] https://github.com/apache/datafusion/issues/12927
- [x] #12828
- [x] #12859
- [x] #12858
- [x] #12867
- [x] #12868
- [x] #12934
- [x] #12935
Window Function Documentation Migration
- [x] Port / Add
DocumentationforLead - [x] Port / Add
DocumentationforRowNumber - [ ] #12936
Update is I have a few follow on PRs
- [x] https://github.com/apache/datafusion/issues/12741
- [x] https://github.com/apache/datafusion/issues/12432
Then I figure I will file one or two other tickets to see how easy it is to port other functions
And if all goes well I'll make a ticket storm to port the docs
FYI I've finished up the string expression migration locally - just waiting on the updated code from #12742 to land before pushing a PR.
FYI I've finished up the string expression migration locally - just waiting on the updated code from #12742 to land before pushing a PR.
Thanks, I'll merge those as soon as they get a review (I can't merge them until they get an approval from another committer)
No rush I just wanted to make sure no one was duplicating work needlessly
@alamb I think it should be mentioned here that ./dev/update_function_docs.sh should be ran, as it wasn't immediately obvious, and when I did my first migration I took a bit of time to find it.
@alamb I think it should be mentioned here that ./dev/update_function_docs.sh should be ran, as it wasn't immediately obvious, and when I did my first migration I took a bit of time to find it.
Thank you @jonathanc-n -- I added it to the description of https://github.com/apache/datafusion/issues/12859 and will add it on subsequent tickets
Thanks to @jonathanc-n @Omega359 and others we are cranking right along with this. I expect it to be done sometime later this week
I think we are pretty close here
After we merge these PRs from @jonathanc-n and myself
- https://github.com/apache/datafusion/pull/13069
- https://github.com/apache/datafusion/pull/13047
We then have:
- [x] https://github.com/apache/datafusion/issues/13021
Then we have just a few more functions to document (lead, lag) and we can turn on a CI check to ensure all new functions are documented
(venv) andrewlamb@Andrews-MacBook-Pro-2:~/Software/datafusion$ ./dev/update_function_docs.sh
/Users/andrewlamb/Software/datafusion
Inserting header
Running CLI and inserting aggregate function docs table
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.34s
Running `target/debug/print_functions_docs aggregate`
Running prettier
docs/source/user-guide/sql/aggregate_functions_new.md 59ms
'docs/source/user-guide/sql/aggregate_functions_new.md' successfully updated!
Inserting header
Running CLI and inserting scalar function docs table
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.09s
Running `target/debug/print_functions_docs scalar`
INFO: The following functions do not have documentation:
- map_values
- map_keys
- map
- map_extract
Running prettier
docs/source/user-guide/sql/scalar_functions_new.md 208ms
'docs/source/user-guide/sql/scalar_functions_new.md' successfully updated!
Inserting header
Running CLI and inserting window function docs table
Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.09s
Running `target/debug/print_functions_docs window`
INFO: The following functions do not have documentation:
- lag
- lead
Running prettier
docs/source/user-guide/sql/window_functions_new.md 26ms
'docs/source/user-guide/sql/window_functions_new.md' successfully updated!
lead/lag PR @ https://github.com/apache/datafusion/pull/13095
Other functions that still exist on scalar functions page - https://github.com/apache/datafusion/issues/13036
Here is a PR to make sure no new functions are added without documentation: https://github.com/apache/datafusion/pull/12938
🎉 I think all we need is https://github.com/apache/datafusion/issues/13171 and we can claim this project is complete. Thanks again @Omega359 @buraksenn and @jonathanc-n for pushing it along
🎉 epic