pyjanitor
pyjanitor copied to clipboard
Remaining functions from R version
The following is a list of functions missing from the PyJanitor library that are implemented in the R version. I think the aggregation and adornment can be put in their own submodules later.
To be implemented
Main Functions
- [x] convert_to_NA.R
- [x] clean_names.R
- [X] excel_dates.R
- [X] get_dupes.R
- [X] remove_empties.R
- [X] row_to_names.R
- [x] use_first_valid_of.R
Aggregation
- [ ] as_and_untabyl.R
- [ ] print_tabyl.R
- [ ] tabyl.R
- [ ] top_levels.R
Adornment
- [ ] adorn_crosstab.R
- [ ] adorn_ns.R
- [ ] adorn_pct_formatting.R
- [ ] adorn_percentages.R
- [ ] adorn_rounding.R
- [ ] adorn_title.R
- [ ] adorn_totals.R
Won't be implemented
- [x] round_half_up.R
- Probably don't need to implement this; the main reason it exists is because round(2.5) in R is 2, this makes round(2.5) == 3. For Python, round(2.5) == 3.
- [x] make_clean_names.R
- Helper function for
clean_names.R
- [x] get_level_groups.R
- Helper function for
top_levels.R
- [x] crosstab.R
- Deprecated in favor of tabyl
Thanks for taking the effort to document this, @szuckerman!
One little question: I had a bit of trouble with the "adorn" semantics (implementation is fine with me) - is there a more accessible English verb for non-AABCs (Americans/Aussies/Brits/Canadians)? Even as a Canadian whose first language is English, I didn't think the "adorn" verb was that informative. Thinking from an ESL perspective, that verb would be considered quite foreign.
That said, since pyjanitor is supposed to be the Python implementation of janitor, it would go against the spirit of the package to provide an alternative verb. I guess issues with the semantics would belong on the original R package issue tracker instead.
Perhaps “stylize” or “add_style”?
Or maybe “add_formatting”?
That said, since pyjanitor is supposed to be the Python implementation of janitor, it would go against the spirit of the package to provide an alternative verb.
To that, in general, I'd personally say "meh". I think we have a good thing going with carefully choosing names for functions, and I'm not sure the relative weight of keeping people migrating from R to Python comfortable vs. sticking to the vision of the project leans towards the former.
That being said, in this particular case, since there is a whole family of adorn_
functions instead of just one, it might not be a big deal to keep that term... In the docs, we can explain the concept of what this class of functions is trying to accomplish - "make it look good" is a pretty simple concept that you just have to learn once and then you know what that entire group is trying to accomplish at a glance.
In the docs, we can explain the concept of what this class of functions is trying to accomplish
Yes, yes! Docs. I really need to improve the docs, big thanks to both of you for helping so much with the documentation!
convert_to_NA.R
has been deprecated in favor of dplyr::na_if()
and use_first_valid_of.R
has been deprecated in favor of dplyr::coalesce()
.
We already have a coalesce
method, but do we have something that is similar to the dplyr::na_if()
function? fill_empty
?
fill_empty
, I think, does something different from the na_if()
?
I finally got some time to look at it: na_if()
puts a null
value in a column if a cell value is equal to some number.
And to add to this thread: https://sfirke.github.io/janitor/news/index.html
Janitor 1.2.0 was released recently! :tada: That means more functionality that we can try targeting to port over.
I don't know if the adorn_
family naming has been settled or not, but I had some ideas depending on what you all think:
-
top_with_*
In the sense of garnishing (this word seemed to connected to adorn) a dish or drink with final touches (which may be too long,final_touch_*
-
coat_with_*
in the sense of polishing a floor -
tweak_*
since these are usually little, but important, changes (probably my favorite)
Those are great ideas, @robertmitchellv! Given enough time away from the issue, and the recent improvements to the docs, I think I have less issue with the name adorn_
.Back when the issue was started, we didn’t have a taxonomy/ontology for the functions on the docs page, and so I was worried about the adorn functions’ purpose being unclear. But now that it was contributed (I think at PyCon), and because the adorn
functions form a family of functions on their own for “stylizing” a dataframe, I think they should totally come in!
Are you open to trying your hand at one of them via a PR?
Yeah, I think I can try!
I've been reviewing the source code a lot to get more familiar with how things are done in pyjanitor
and wrote my first copy of dplyr::count()
so hopefully I can figure out the best way to make one of the simpler adorn_*
functions happen--I'm hoping for a good amount of back and forth the get it to the place where it's polished. Coming back to Python after maybe 5 years 😬
Welcome back!
To get started, the docs are the best place to go. They're intended to be quite beginner-friendly, so if you run into issues getting set up, don't hesitate to sneak in a few modifications to the docs too 😄! And feel free to ping around on the issue tracker if there's any issues you run into!