cpp11
cpp11 copied to clipboard
Ensure that `data_p_` is typed with the underlying type of `data_`
This PR attempts to force data_p_ to always be a pointer to the "underlying" type of SEXP data_.
Currently, this is not the case. data_p_ currently has type T*, which means that for various vectors it maps to:
- integers:
int* - doubles:
double* - logicals:
r_bool*(rather thanint*) - raws:
uint8_t*(rather thanRbyte*) - list: not used, but would be
SEXP* - strings: not used, but would be
r_string*(rather thanSEXP*) - complexes: not implemented (yet!, but it is built on top of this here https://github.com/DavisVaughan/cpp11/pull/2)
The logicals and raws cases make me a little nervous. When data_p_ is set for these classes, it is done so like:
data_p_ = reinterpret_cast<r_bool*>(LOGICAL(data))
I think we can avoid this by instead ensuring that data_p_ has type int* here, and is just set to LOGICAL(data). Then we rely on casts from r_bool to int and vice versa when setting and getting elements.
I think it definitely cleaned up logicals.hpp and raws.hpp nicely, as some intermediate static_cast() and reinterpret_cast() calls were removed.
I'm also not entirely sure how this reinterpret_cast() is currently working, but I assume it works because r_bool just has 1 variable member, int value_, which is the same size as an int that we'd get from the int*? Nevertheless, this new approach feels safer and easier to understand to me.
The general idea of this PR is to introduce a compile time "underlying" type that we can deduce from T. The default is just T, which works for int and double, but this is overridden for r_bool and uint8_t.
We then use this underlying type for the buffer (buf_) and for the pointer type.
This PR should have other benefits too. I am working on adding complexes, and I think I will use an r_complex intermediate type which will also define a custom underlying type, like:
namespace traits {
template <>
struct get_underlying_type<r_complex> { using type = Rcomplex; };
} // namespace traits
Additionally, for read-only lists and character vectors, we could set the data_p_ to DATAPTR(), like we do in vctrs https://github.com/r-lib/vctrs/blob/7260d31a31b87ece16c7bbda3521e91d1c0f242f/src/vctrs-core.h#L97. This could provide much faster access, since we don't have to go through VECTOR_ELT(), and should be safe for read-only access. That would require the get_underlying_type<r_string> I've added here (it currently goes unused).