janitor
janitor copied to clipboard
Breaking off tabyl and adorn_ into their own package?
There are a few open issues suggesting improvements to adorn_
functions like having them use the dplyr::select
helpers to specify columns. I think those should be addressed as part of a broader rethinking of the tabyl API. And once that's on the table, I've come to agree that tabyl and its friends belong in their own package. janitor can be purely for data cleaning and focus on new functions like #50.
Note: I have seen dreams discussed of a "grammar of tables" package, I hope someone builds that package but tabyl
will still just focus on counts and things you do to tables of counts.
Before making a new repo and going down that path: what do people think? What's most weighing on my mind is, if someone wants to load janitor and
It seems to me that having separate janitor
and tabyl
packages would be a good idea, as there seems to be two "families" of functions in the package. I'm not sure I understand what the namespaces conflicts would be : are there functions that would have to be in both packages ?
I agree. When I think of janitor
, I think of all the awesome cleaning functions. tabyl
is great, but it seems artificial to shoehorn them in together.
I also agree that they feel separate. The cleaning functions seem like they fit the name janitor
more than the tabyl
functions.
I am on board with this. @juba glad you asked as this is still my biggest mental obstacle: what to call the new package & the main function (currently tabyl
)? I would want to be careful with how to deprecate janitor::tabyl
.
What behavior should happen? I guess calls to janitor::tabyl
should print a warning saying "don't use janitor::tabyl
, use tidytable::ttable
instead" and then that would free up tidytable::ttable
to have whatever breaking changes are deemed appropriate?
I don't know that I like "tidytable" or "ttable", I'm just using those as examples. If janitor didn't exist I'd call the package and function tabyl
but I'm afraid that users will call tabyl()
and depending on what got loaded and in what order, different functions will be called (tabyl::tabyl
or janitor::tabyl
) and it would be a headache for them.
For the mechanics of splitting, I think that there are two good options:
Drop the functions from janitor
and have a message on load that the functions have moved to a new package.
Drop the functions from janitor
and import and reexport the new package’s functions within janitor
.
In my opinion, the best method is to do both of the above at the beginning. Then some time later, stop reexporting. Then some time later, drop the message.
I think I agree with @billdenney suggestion. With this solution you don't duplicate the tabyl
functions, you can call the new package tabyl
if you wish, and users should not be directly affected by the change. With an on load message you can alert them of the new package and by importing and reexporting the transition can be smoothed for a while.
@sfirke, I was thinking of generating the separate packages soon. The process I was thinking of is:
- Triage all current pull requests so that the split can be clean.
- Make two branches-- one that is janitor-only and one that is tabyl-only
- Push the new tabyl-only repository into a new repository (it looks like that would be simple, and you'd keep the full history: https://stackoverflow.com/questions/2227062/how-do-i-move-a-git-branch-out-into-its-own-repository)
- Push the new tabyl repository to CRAN ASAP so that the reimport would be straight-forward.
- Once the new tabyl package was on CRAN, push the new janitor package (2.0?) to CRAN.
The asks for you in this process would be:
- Could you please triage the PRs?
- Could you please create a new tabyl repository?
Edit with a suggestion: There are a few deprecated functions. I'd suggest that those be made defunct with this major change to the package.
Love it, will do those two things. Might put the triaged issues into a Google doc or something if needed.
Would you like to be the maintainer of the tabyl package from the beginning? You would have final say and handle CRAN submission. I can advise but I think the momentum sits with you and I'd feel good handing off the torch officially.
On Thu, Aug 8, 2019, 4:25 PM Bill Denney [email protected] wrote:
@sfirke https://github.com/sfirke, I was thinking of generating the separate packages soon. The process I was thinking of is:
- Triage all current pull requests so that the split can be clean.
- Make two branches-- one that is janitor-only and one that is tabyl-only
- Push the new tabyl-only repository into a new repository (it looks like that would be simple, and you'd keep the full history: https://stackoverflow.com/questions/2227062/how-do-i-move-a-git-branch-out-into-its-own-repository )
- Push the new tabyl repository to CRAN ASAP so that the reimport would be straight-forward.
- Once the new tabyl package was on CRAN, push the new janitor package (2.0?) to CRAN.
The asks for you in this process would be:
- Could you please triage the PRs?
- Could you please create a new tabyl repository?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sfirke/janitor/issues/240?email_source=notifications&email_token=ABZYDEGFUGQEX2C62E2TJNDQDR6LFA5CNFSM4FWRX372YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD34ZUNY#issuecomment-519674423, or mute the thread https://github.com/notifications/unsubscribe-auth/ABZYDEAANI4GW65L5QTDJIDQDR6LFANCNFSM4FWRX37Q .
Also, if there would ever be a time to change the name of this function, this would be it. That might almost make the transition easier in terms of namespace clashes.
I have thought a lot about the function name, and asked others, and never found one I like better. But I still think this name "tabyl" is imperfect and something better is possible.
On Sat, Aug 10, 2019, 1:39 PM Sam Firke [email protected] wrote:
Love it, will do those two things. Might put the triaged issues into a Google doc or something if needed.
Would you like to be the maintainer of the tabyl package from the beginning? You would have final say and handle CRAN submission. I can advise but I think the momentum sits with you and I'd feel good handing off the torch officially.
On Thu, Aug 8, 2019, 4:25 PM Bill Denney [email protected] wrote:
@sfirke https://github.com/sfirke, I was thinking of generating the separate packages soon. The process I was thinking of is:
- Triage all current pull requests so that the split can be clean.
- Make two branches-- one that is janitor-only and one that is tabyl-only
- Push the new tabyl-only repository into a new repository (it looks like that would be simple, and you'd keep the full history: https://stackoverflow.com/questions/2227062/how-do-i-move-a-git-branch-out-into-its-own-repository )
- Push the new tabyl repository to CRAN ASAP so that the reimport would be straight-forward.
- Once the new tabyl package was on CRAN, push the new janitor package (2.0?) to CRAN.
The asks for you in this process would be:
- Could you please triage the PRs?
- Could you please create a new tabyl repository?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/sfirke/janitor/issues/240?email_source=notifications&email_token=ABZYDEGFUGQEX2C62E2TJNDQDR6LFA5CNFSM4FWRX372YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD34ZUNY#issuecomment-519674423, or mute the thread https://github.com/notifications/unsubscribe-auth/ABZYDEAANI4GW65L5QTDJIDQDR6LFANCNFSM4FWRX37Q .
There is lots to say here:
- I emailed you about being a maintainer. As a related note to that, does it make sense to keep
tabyl
or should the goal be to ask to incorporate it into another package? (My philosophy: When I write something new, usually my goal is to try to minimize the number of CRAN packages by making a contribution to another package where the functionality is a good fit, often as a subtle modification to what is already there. That's part of how I became a contributor tojanitor
.)- A quick search (https://dabblingwithdata.wordpress.com/2017/12/20/my-favourite-r-package-for-frequency-tables/) suggests that
summarytools
could be a good fit for the functionality, if we go this route.
- A quick search (https://dabblingwithdata.wordpress.com/2017/12/20/my-favourite-r-package-for-frequency-tables/) suggests that
- For function naming, that isn't my strength. I tend to prefer maintaining backward compatibility as much as is reasonable for packages. So, I lean toward keeping the name (or at most deprecating it for something else but not a full break).
- For issue triage, that is simpler because issues can be moved between repositories (https://help.github.com/en/articles/transferring-an-issue-to-another-repository). My main concern was pull requests since this would be a major change in the code base and I'd guess that almost none of the PRs would cleanly apply after the switch.
I'm a huge fan of {janitor} and would support the conscious uncoupling of tabyl()
and the adorn_()
family into a separate package. I've also seen a similar idea discussed in {skimr} but I'm not sure if that's in the works. I've also noticed that RStudio's GitHub-only {gt} is very much a "grammar of tables," however it does not seem to focus on pretty printing of tables in the R console.
I'd like to suggest the package name {tidytab} and function name tab()
for the new package — these are what I chose for my GitHub-only package of the same name, which I built as a hobby. I came to R from Stata where tabulate
(which can be shortened down to ta
or tab
) is key to console-driven exploratory data analysis, and my goal with {tidytab} was to port it to R with additional functionality such as:
- Use of tidyselect helpers which is already implemented in {tidytab};
- Pretty printing of tables in the console with block drawing characters and color (grey for block drawing characters, red for
NA
, although hopefully in the future the colors could be informed byrstudioapi::getThemeInfo()
); - Ports of the very helpful
tab1()
andtab2()
functions from Stata for rapid tabulation of a set of variables into either one- or two-way tables; - Tabulation of named and unnamed single vectors (a future plan is to also combine two vectors of the same length into 2x2 tables);
- Flat tables in
ftab()
for those that don't like 2x2 contingency tables, which also allows n-way tabulation of more than two variables. - Table wrapping in the R console.
I had previously contributed to {statar} where we also implemented pretty printing of tables in the R console, but I eventually built my own package since I think there's quite a bit of additional table-specific functionality that I would like to see.
I would be happy to help with the effort.