cpp11 icon indicating copy to clipboard operation
cpp11 copied to clipboard

Ensure that `data_p_` is typed with the underlying type of `data_`

Open DavisVaughan opened this issue 3 years ago • 0 comments
trafficstars

This PR attempts to force data_p_ to always be a pointer to the "underlying" type of SEXP data_.

Currently, this is not the case. data_p_ currently has type T*, which means that for various vectors it maps to:

  • integers: int*
  • doubles: double*
  • logicals: r_bool* (rather than int*)
  • raws: uint8_t* (rather than Rbyte*)
  • list: not used, but would be SEXP*
  • strings: not used, but would be r_string* (rather than SEXP*)
  • complexes: not implemented (yet!, but it is built on top of this here https://github.com/DavisVaughan/cpp11/pull/2)

The logicals and raws cases make me a little nervous. When data_p_ is set for these classes, it is done so like:

data_p_ = reinterpret_cast<r_bool*>(LOGICAL(data))

I think we can avoid this by instead ensuring that data_p_ has type int* here, and is just set to LOGICAL(data). Then we rely on casts from r_bool to int and vice versa when setting and getting elements.

I think it definitely cleaned up logicals.hpp and raws.hpp nicely, as some intermediate static_cast() and reinterpret_cast() calls were removed.

I'm also not entirely sure how this reinterpret_cast() is currently working, but I assume it works because r_bool just has 1 variable member, int value_, which is the same size as an int that we'd get from the int*? Nevertheless, this new approach feels safer and easier to understand to me.


The general idea of this PR is to introduce a compile time "underlying" type that we can deduce from T. The default is just T, which works for int and double, but this is overridden for r_bool and uint8_t.

We then use this underlying type for the buffer (buf_) and for the pointer type.


This PR should have other benefits too. I am working on adding complexes, and I think I will use an r_complex intermediate type which will also define a custom underlying type, like:

namespace traits {
template <>
struct get_underlying_type<r_complex> { using type = Rcomplex; };
}  // namespace traits

Additionally, for read-only lists and character vectors, we could set the data_p_ to DATAPTR(), like we do in vctrs https://github.com/r-lib/vctrs/blob/7260d31a31b87ece16c7bbda3521e91d1c0f242f/src/vctrs-core.h#L97. This could provide much faster access, since we don't have to go through VECTOR_ELT(), and should be safe for read-only access. That would require the get_underlying_type<r_string> I've added here (it currently goes unused).

DavisVaughan avatar Mar 08 '22 16:03 DavisVaughan