pyjanitor icon indicating copy to clipboard operation
pyjanitor copied to clipboard

Remaining functions from R version

Open szuckerman opened this issue 6 years ago • 12 comments

The following is a list of functions missing from the PyJanitor library that are implemented in the R version. I think the aggregation and adornment can be put in their own submodules later.

To be implemented

Main Functions

Aggregation

Adornment

  • [ ] adorn_crosstab.R
  • [ ] adorn_ns.R
  • [ ] adorn_pct_formatting.R
  • [ ] adorn_percentages.R
  • [ ] adorn_rounding.R
  • [ ] adorn_title.R
  • [ ] adorn_totals.R

Won't be implemented

  • [x] round_half_up.R
  • Probably don't need to implement this; the main reason it exists is because round(2.5) in R is 2, this makes round(2.5) == 3. For Python, round(2.5) == 3.
  • [x] make_clean_names.R
  • Helper function for clean_names.R
  • [x] get_level_groups.R
  • Helper function for top_levels.R
  • [x] crosstab.R
  • Deprecated in favor of tabyl

szuckerman avatar Dec 13 '18 20:12 szuckerman

Thanks for taking the effort to document this, @szuckerman!

One little question: I had a bit of trouble with the "adorn" semantics (implementation is fine with me) - is there a more accessible English verb for non-AABCs (Americans/Aussies/Brits/Canadians)? Even as a Canadian whose first language is English, I didn't think the "adorn" verb was that informative. Thinking from an ESL perspective, that verb would be considered quite foreign.

That said, since pyjanitor is supposed to be the Python implementation of janitor, it would go against the spirit of the package to provide an alternative verb. I guess issues with the semantics would belong on the original R package issue tracker instead.

ericmjl avatar Dec 16 '18 20:12 ericmjl

Perhaps “stylize” or “add_style”?

Or maybe “add_formatting”?

szuckerman avatar Dec 16 '18 22:12 szuckerman

That said, since pyjanitor is supposed to be the Python implementation of janitor, it would go against the spirit of the package to provide an alternative verb.

To that, in general, I'd personally say "meh". I think we have a good thing going with carefully choosing names for functions, and I'm not sure the relative weight of keeping people migrating from R to Python comfortable vs. sticking to the vision of the project leans towards the former.

That being said, in this particular case, since there is a whole family of adorn_ functions instead of just one, it might not be a big deal to keep that term... In the docs, we can explain the concept of what this class of functions is trying to accomplish - "make it look good" is a pretty simple concept that you just have to learn once and then you know what that entire group is trying to accomplish at a glance.

zbarry avatar Dec 17 '18 15:12 zbarry

In the docs, we can explain the concept of what this class of functions is trying to accomplish

Yes, yes! Docs. I really need to improve the docs, big thanks to both of you for helping so much with the documentation!

ericmjl avatar Dec 17 '18 16:12 ericmjl

convert_to_NA.R has been deprecated in favor of dplyr::na_if() and use_first_valid_of.R has been deprecated in favor of dplyr::coalesce().

We already have a coalesce method, but do we have something that is similar to the dplyr::na_if() function? fill_empty?

szuckerman avatar Mar 15 '19 17:03 szuckerman

fill_empty, I think, does something different from the na_if()?

ericmjl avatar Mar 15 '19 19:03 ericmjl

I finally got some time to look at it: na_if() puts a null value in a column if a cell value is equal to some number.

ericmjl avatar Mar 17 '19 13:03 ericmjl

And to add to this thread: https://sfirke.github.io/janitor/news/index.html

Janitor 1.2.0 was released recently! :tada: That means more functionality that we can try targeting to port over.

ericmjl avatar Apr 22 '19 11:04 ericmjl

I don't know if the adorn_ family naming has been settled or not, but I had some ideas depending on what you all think:

  • top_with_* In the sense of garnishing (this word seemed to connected to adorn) a dish or drink with final touches (which may be too long, final_touch_*
  • coat_with_* in the sense of polishing a floor
  • tweak_* since these are usually little, but important, changes (probably my favorite)

robertmitchellv avatar Aug 05 '20 05:08 robertmitchellv

Those are great ideas, @robertmitchellv! Given enough time away from the issue, and the recent improvements to the docs, I think I have less issue with the name adorn_.Back when the issue was started, we didn’t have a taxonomy/ontology for the functions on the docs page, and so I was worried about the adorn functions’ purpose being unclear. But now that it was contributed (I think at PyCon), and because the adorn functions form a family of functions on their own for “stylizing” a dataframe, I think they should totally come in!

Are you open to trying your hand at one of them via a PR?

ericmjl avatar Aug 06 '20 14:08 ericmjl

Yeah, I think I can try!

I've been reviewing the source code a lot to get more familiar with how things are done in pyjanitor and wrote my first copy of dplyr::count() so hopefully I can figure out the best way to make one of the simpler adorn_* functions happen--I'm hoping for a good amount of back and forth the get it to the place where it's polished. Coming back to Python after maybe 5 years 😬

robertmitchellv avatar Aug 06 '20 21:08 robertmitchellv

Welcome back!

To get started, the docs are the best place to go. They're intended to be quite beginner-friendly, so if you run into issues getting set up, don't hesitate to sneak in a few modifications to the docs too 😄! And feel free to ping around on the issue tracker if there's any issues you run into!

ericmjl avatar Aug 06 '20 22:08 ericmjl