mojo
mojo copied to clipboard
[BUG]: Creating a String from a DynamicVector drops the last character in the vector
Bug description
Converting a DynamicVector
to a String
using the String
constructor results in a string that is missing the last element of the DynamicVecotor
used to create it.
Steps to reproduce
The following code reproduces the issue:
fn test_stringify() raises:
var example = DynamicVector[Int8]()
example.append(ord("e"))
example.append(ord("x"))
var container = DynamicVector[Int8]()
for i in range(len(example)):
container.append(example[i])
let stringifed = String(container)
assert_equal("ex", stringifed)
# Unhandled exception caught during execution: AssertionError: ex is not equal to e
System information
❯ mojo -v
mojo 0.7.0 (af002202)
❯ modular -v
modular 0.4.1 (2d8afe15)
On an M1 Mac.
I'm not sure if this is a doc bug or an API design issue--maybe a bit of both.
The first constructor for String()
takes a DynamicVector[Int8]
argument which it expects to be a null-terminated array of characters. While that is the internal format that String
uses, jit yields unexpected results in this case.
The constructors that take pointers also assume that the data is already null-terminated and that the provided len
includes the terminator.
There's a private _unsafe_from_bytes()
method that does exactly what @sstadick is looking to do here. (And a public asbytes()
method that does the reverse).
TL;DR: It seems kind of backwards to me that the default paths in/out of String assume that the buffer coming in is null-terminated. But if that's the case we should just make sure the API docs for any method that takes a DynamicVector
or Pointer
mentions whether the data in question is null-terminated.
So workaround is--use _unsafe_from_bytes()
or use container.append(0)
to null-terminate your string first.
Wow, this looks like we should require a keyword argument here, so String(nul_terminated=container)
, and it should check/abort if not nul terminated.
Thanks @arthurevans my question is answered, the behavior does at a minimum seem unexpected, feel free to close this if needed.
RE asbytes()
, I don't see that in the String docs? https://docs.modular.com/mojo/stdlib/builtin/string.html Is there another place for docs that I should be looking, or is that maybe in a yet-to-be-public version?
@arthurevans What I get from your response is that the internal representation of String
requires null termination. Is that accurate? Why go with that design? Maybe we can add those information to the doc?
@sstadick private methods (with a leading underscore) don't appear in the docs but do appear in code completion hints. In this case, I don't see it there, either, so it must be new. Sorry for the confusion. At any event, adding the null yourself is probably the better path for now. I've opened a PR to clarify the API docs as a short-term fix until the API can be updated.
@soraros yes, the internal buffer requires null termination. I believe this is not really for Mojo's sake, since Mojo tracks the length of the string. But it simplifies communicating with lower-level facilities that take C-style strings. (For example, if you open a file on Linux, at some point the Mojo std library will call into libc's open()
, which takes the pathname as C-style string.)
@arthurevans Thanks for the reply. That explanation is both foreseeable and unfortunate. However, why didn't Mojo go with a separate CString
type like Rust and Swift? Can we at least have the rationale documented somewhere?