datafusion
datafusion copied to clipboard
Document how to test examples in user guide, add some more coverage
Which issue does this PR close?
Part of https://github.com/apache/datafusion/issues/11172 and https://github.com/apache/datafusion/issues/1813
Rationale for this change
I am trying to make the examples easier to find / navigate. Right now there are many files in datafusion-examples but they are somewhat overwhelming. There are also some examples in the user guide, but many are not tested
I tried to consolidate the examples in https://github.com/apache/datafusion/pull/11173 but @lewiszlw (rightly) pointed out that one giant .rs example file might be even harder to find / navigate
I think the ideal outcome from a users perspective is for the examples to be inline in the user guide so they are both discoverable and the context explained. However, it is critical that the examples be testable
What changes are included in this PR?
- Document how to test examples in the user guide
- Add test coverage for
expressions.rsand fix the example there.
Are these changes tested?
Yes,
Are there any user-facing changes?
Documentation on tests.
As a newcomer, I can say that more guidance on the examples is good! I think it might also be helpful:
- To perhaps provide more guidance on when on when an example should be included in the documentation?
- To include links to the appropriate examples in the example directory when appropriate? Especially for more advanced use cases?
I'll also note that the for newcomers the hardest part of examples - and tests - is figuring out how to setup the appropriate pre-conditions. Maybe we could create some "best practices" around how to approach this. Or perhaps its just a matter of reading enough existing examples and tests!
Thanks @efredine, this is great feedback
To perhaps provide more guidance on when on when an example should be included in the documentation?
Yes, I agree this would be helpful. I'll think if I can come up with some summary that matches current reality (rather than what I would ideally liek)
To include links to the appropriate examples in the example directory when appropriate? Especially for more advanced use cases?
Yes, I agree -- I think this is the approach that @tshauck took with the user guide (e.g. https://datafusion.apache.org/library-user-guide/working-with-exprs.html has links to various examples)
I'll also note that the for newcomers the hardest part of examples - and tests - is figuring out how to setup the appropriate pre-conditions. Maybe we could create some "best practices" around how to approach this. Or perhaps its just a matter of reading enough existing examples and tests!
When you say "preconditions" what do you mean? Is it the correct data directories / data files checked out?
By pre-conditions I meant that in order to test or illustrate something like an option when reading a file I need to first create a file and write to it with data that is meaningful for the thing being tested.
By pre-conditions I meant that in order to test or illustrate something like an option when reading a file I need to first create a file and write to it with data that is meaningful for the thing being tested.
🤔 Maybe we could add some examples of "how to create test data" (specifically we can now use the COPY command thanks to @devinjdangelo so it is significantly easier)
Thank you for your review (as always @comphead ) 🙏