bad CString conversion corrupt data
Hi,
It looks like inserting data using bind_by_name was broken by the series of patch labeled as CString conversion optimizations.
When inserting data, it looks the length of the string is incorrect and it writes subsequent memory to cassandra:
Value should be "Paul" but the following is inserted:
Paulxxxaasdf0asdfas8d9fyyyb123921381923921939assertion failed: `(left == right)`\n left: ``,\n right: ``examples/repro.rsassertion failed: found_one127.0.0.1/home/antoninamand/.rustup/toolchains/nightly-2020-09-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/string.rs | aasdf0asdfas8d9fyyyb123921381923921939assertion failed: `(left == right)`\n left: ``,\n right: ``examples/repro.rsassertion failed: found_one127.0.0.1/home/antoninamand/.rustup/toolchains/nightly-2020-09-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/string.rs | b123921381923921939assertion failed: `(left == right)`\n left: ``,\n right: ``examples/repro.rsassertion failed: found_one127.0.0.1/home/antoninamand/.rustup/toolchains/nightly-2020-09-15-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/alloc/src/string.rs
I bisect'ed with git with the following example (add to examples/repro.rs) : https://gist.github.com/gwik/2d793da01a88cd6a5280f4a152f61a44
git bisect run cargo run --example repro
~~The first failing commit is cd5f02c2728a541a819d23dd012891b30b520679. But it's already in version 0.15.1 and I the first occurence of this bug I found with 0.15.2-pre were fixed with 0.15.1 so I suppose that similar commits 8ca2f154ed0dcf5543ac9a6d398a0ffdd7d648da and f8793b6f4d1964c58f94c075012352738a9c266f have the same issue (introduced after 0.15.1).~~
I seems the bug is in the cpp driver itself:
https://github.com/datastax/cpp-driver/blob/dbea8c599c2726b77b96d0399c86ea3e836223ed/src/statement.cpp#L226-L230
They ignore the value_length parameter and try to deduct the length looking for a NULL byte str terminator.
Separately, the usage of the pointer for the name is not correct. The name reference should outlive the lifetime of the statement as the cpp driver stores the pointer. It's safe in most usages since people will use static strings, but things can go wrong when using dynamic column names in their code.
Oh no! Have you reported the CPP driver bug at https://datastax-oss.atlassian.net/jira/software/c/projects/CPP/issues/?filter=allissues&= ? I've found them moderately responsive, especially if you offer a patch.
Thanks for the workaround.
Please can you raise a separate issue for the usage problem?
Oh no! Have you reported the CPP driver bug at https://datastax-oss.atlassian.net/jira/software/c/projects/CPP/issues/?filter=allissues&= ? I've found them moderately responsive, especially if you offer a patch.
I made a PR and they merged it, but they haven't released it.
Please can you raise a separate issue for the usage problem?
I will.
Great! Leaving this issue open to track the Cassandra driver fix. Can you link the Cassandra issue?
Issue is here: https://github.com/datastax/cpp-driver/pull/484
Good catch on the bug. Should the code be rolled back to using a null terminated string until the cpp driver is fixed? Being correct is better than fast and broken.
Separately, the usage of the pointer for the name is not correct. The name reference should outlive the lifetime of the statement as the cpp driver stores the pointer. It's safe in most usages since people will use static strings, but things can go wrong when using dynamic column names in their code.
I think this is ok. The name is used to calculate an index and the index is what is stored. So the name only needs to live for the statement->set() call inside of cass_statement_bind_string_by_name_n(), which it does.
The name is used to calculate an index and the index is what is stored
You're absolutely right. Good, we're safe then.
I see there has still been no point release of the CPP driver :-(.
Our master code has the workaround https://github.com/Metaswitch/cassandra-rs/blob/c24346918b56fd73a8b6bee42bc85de8dd61be17/src/cassandra/statement.rs#L610
so cassandra-rs is safe - strings with NULs in will be truncated (unavoidably), but non-NUL-terminated strings are handled correctly.
This issue still remains open, to track DataStax releasing the fixed driver, and us (possibly) reverting the workaround.
I looked for a version in the cpp driver library but couldn't find one. Since we don't know the library version we are executing on I guess we're stuck with the workaround for a very long time. Or maybe someone has a clever idea ?
It's undocumented, but the C++ driver does expose a function const char* datastax::internal::driver_version() - see https://github.com/datastax/cpp-driver/blob/master/src/driver_info.hpp#L23 . (Prior to 2.13.0 this was just cass::driver_version, but then it was moved into datastax::internal.) I haven't looked into what support the Rust FFI has for optionally binding a symbol, but this might give us a mechanism.
This is fixed in 2.16.0 of the DataStax C/C++ driver - see https://github.com/datastax/cpp-driver/compare/2.15.3...2.16.0 . @gwik do you want to investigate version detection?
If I believe this: https://doc.rust-lang.org/nomicon/ffi.html it doesn't look good.
Rust is currently unable to call directly into a C++ library
Am I missing something ?
That's a bit of an overstatement. There's nothing magic about a C++ function. bindgen handles C++ to some degree: https://rust-lang.github.io/rust-bindgen/cpp.html and it should certainly cope with this - the only C++ thing about it is the name-mangling. The bigger problem is avoiding a link error if the symbol is absent; I'm not sure how to do that.