r-polars
r-polars copied to clipboard
More consistency in methods to convert Series/DataFrame to R
Currently this is what we have for DataFrame
and Series
:
library(polars)
options(polars.do_not_repeat_call = TRUE)
####### Series
series_vec = pl$Series(letters[1:3])
# output depends on datatype
series_vec$to_r()
#> [1] "a" "b" "c"
# identical
series_vec$to_r_vector()
#> [1] "a" "b" "c"
series_vec$to_vector()
#> [1] "a" "b" "c"
# not identical
series_vec$to_r_list()
#> [[1]]
#> [1] "a"
#>
#> [[2]]
#> [1] "b"
#>
#> [[3]]
#> [1] "c"
series_vec$to_list()
#> Error: Execution halted with the following contexts
#> 1: $ - syntax error: to_list is not a method/attribute of the class RPolarsSeries
####### DataFrame
df = pl$DataFrame(x = letters[1:3])
# output depends on datatype
df$to_r()
#> Error: Execution halted with the following contexts
#> 1: $ - syntax error: to_r is not a method/attribute of the class RPolarsDataFrame
# not identical
df$to_r_data_frame()
#> Error: Execution halted with the following contexts
#> 1: $ - syntax error: to_r_data_frame is not a method/attribute of the class RPolarsDataFrame
df$to_data_frame()
#> x
#> 1 a
#> 2 b
#> 3 c
# not identical
df$to_r_list()
#> Error: Execution halted with the following contexts
#> 1: $ - syntax error: to_r_list is not a method/attribute of the class RPolarsDataFrame
df$to_list()
#> $x
#> [1] "a" "b" "c"
I think we should have more consistency in the names of the methods:
- remove the
_r_
in the names of Series methods, so that we haveto_r()
,to_list()
andto_vector()
. That would be more consistent withto_data_frame()
andto_list()
that we have forDataFrame
. - I'm even wondering if we should have
to_r()
for Series. When we use it we don't know the class of the output and I'm not sure we should allow that.
@eitsupi what do you think? Also, @grantmcdermott in case you want to participate (mostly about the second point)
Agree.
I don't see the benefit of having more than one, so I think we need to remove some of them and encourage users to use the S3 method e.g. as.vector()
, as.data.frame()
instead of $to_vector()
, $to_data_frame()
. (i.e. should not use them on the document)
I really dislike not knowing at first glance whether the $to_data_frame()
will be an RPolarsDataFrame or a data.frame, so I actually don't even have a problem with all these methods being private.
Another thing I feel is does it make sense for <DataFrame>$to_list()
to return "list of vectors" instead of "list of Serieses"?
Another thing I feel is does it make sense for
<DataFrame>$to_list()
to return "list of vectors" instead of "list of Serieses"?
I noticed that there is already a method in DetaFrame called get_columns()
that returns a list of Series.
https://docs.rs/polars/0.38.3/polars/frame/struct.DataFrame.html#method.get_columns
The name to_list()
may not be consistent...