polars
polars copied to clipboard
Improve formatting of Python API reference
There are a few improvements I would like to see when it comes to the Python API reference page.
- [ ] Sort the menu on the left by relevancy instead of alphabetical order.
- Helps me find what I need if I don't know exactly what it's called.
- Some of these might not even need to be in the docs, like
Exceptions.
- [ ] Remove/reduce duplicate entries for the same function
- As an example, under Expressions, there is
polars.max,polars.Expr.max, andpolars.Expr/polars.Expr.max. - There are already a lot of functions to go through for a user, so we should be careful to present the information concisely.
- As an example, under Expressions, there is
- [ ] Fix broken list formatting in a number of places.
- For example, see the 'strict' parameter here.
- [ ] Improve consistency in capitalization
- ~Remove type specifications from function signatures.~ Not desirable right now, see discussion below.
I don't actually know how exactly the docs are built, but I wouldn't mind contributing to these improvements if you agree that this is the way to go.
I have some more suggestions:
- [ ] Document the functions based on the public api, i.e. we should not show
polars.internals.<fun>anywhere, butpolars.<fun>(or even just<fun>?). That holds for the menu on the left, but also the type annotations. - [ ] Make it clearer that
ExprDateTimeNameSpacecan be accessed as<expr>.dt.<fun>,ExprStringNameSpaceas<expr>.str.<fun>andExprListNameSpaceas<expr>.arr.<fun>(and if you are like me, you always first type.listrather than.arrafter consulting the docs, because it is called a ListNameSpace, not Arr(ay)Namespace, that should be made consistent as well, but is an api change). Same forcatandstruct.
Remove/reduce duplicate entries for the same function
It would be a huge win if we could define, document and test functions in a consistent way across the various interfaces (Expr, DataFrame, Series, stand-alone function). Some of this I have tried by making it easy to test for both Series and Expr the same function by adding some test utils, but it is hard to generalize unfortunately. Still, for Series <-> Expr there should always be an equivalent, and for these functions they should also apply column wise on DataFrame?
Remove type specifications from function signatures.
I am not sure I fully agree. Yes, sometimes it gets unwieldy, which is probably what you are thinking of, but I find this very useful: https://pola-rs.github.io/polars/py-polars/html/reference/api/polars.DataFrame.write_csv.html. It tells me all those input types for file are being supported. It also tells me right away that the return value can be a string or None (which is not nice btw, I would prefer a separate to_csv method for the string method, but ok), but is something to keep in mind. The alternative would be to relegate that information to the docstring, which means duplicating a lot of information, and making sure it stays in sync with the actual implementation and type annotations.
Fully agree.
I will strikethrough my suggestion for removing type specs from function signatures. Having the information in the docstring is a lot of work and prone to errors / being outdated.
We could probably be better at using type aliases though, to improve readability of our type hints. But that's a story for another time 😄
I have some more suggestions:
- [ ] Document the functions based on the public api, i.e. we should not show
polars.internals.<fun>anywhere, butpolars.<fun>(or even just<fun>?). That holds for the menu on the left, but also the type annotations.- [ ] Make it clearer that
ExprDateTimeNameSpacecan be accessed as<expr>.dt.<fun>,ExprStringNameSpaceas<expr>.str.<fun>andExprListNameSpaceas<expr>.arr.<fun>(and if you are like me, you always first type.listrather than.arrafter consulting the docs, because it is called a ListNameSpace, not Arr(ay)Namespace, that should be made consistent as well, but is an api change). Same forcatandstruct.
Done (99%) ... https://github.com/pola-rs/polars/pull/5376 :)
Amazing work @alexander-beedie !
we should not show polars.internals.
anywhere, but polars. (or even just ?). That holds for the menu on the left, but also the type annotations.
Got most (annoyingly not all) of the type annotations under control now too... https://github.com/pola-rs/polars/pull/5388
Really nice work, @alexander-beedie ! I will update this issue soon.
I'm going to close this issue as we are miles ahead of where we were when I opened this issue. Many thanks for @alexander-beedie and others for the nice improvements.
I realize we're not perfect yet. If you run into specific issues with the API reference formatting, please open a new issue.