Turbo-Base64 icon indicating copy to clipboard operation
Turbo-Base64 copied to clipboard

invalid UTF-8 bytes after “=" "=="

Open Lemonononon opened this issue 1 year ago • 3 comments

Hi @powturbo Thanks for your great work! Recently, I discovered that when using this library to encode image files, there are some strange characters appearing at the end. Printing them out shows 'NULL' or just some patterns. ( Like ”7mlbMjdKxLobZAOx6jFekoqMbHg==�#��+Z��8Z�s)��k_H���pd�?���Ծ ” "Px/wA7sn4uWWf/AAj/AA3/ALQooor0Yg==NULLNULLNULL"

My code:

std::ifstream ifs(file_path, std::ios::binary);
if (!ifs.is_open()) {
    std::cerr << "Unable to open file: " << file_path << std::endl;
}

ifs.seekg(0, std::ios::end);
auto size = ifs.tellg();
ifs.seekg(0, std::ios::beg);

// Read the file content into a char buffer
auto buf = new unsigned char[size];
ifs.read((char *) buf, size);

//use turbobase64
auto outsize = tb64enclen(size);
auto out = new uint8_t[outsize];

size_t num_enc = tb64enc(buf, size, out); //error handle

out[num_enc] = 0;

std::string str_encode(out, out + num_enc);

std::cout << str_encode << std::endl;

I'm confused. Shouldn't the size of a string converted to Base64 be fixed? Why are there unknown characters appearing

Lemonononon avatar Nov 20 '23 01:11 Lemonononon

The output size is fixed to ((input_size + 2)/3 * 4). You must use : auto out = new uint8_t[outsize +1]; when you put 0 at the end of the buffer with out[num_enc] = 0.

powturbo avatar Nov 21 '23 15:11 powturbo

@powturbo Thank you! Previously, I discovered this issue and made attempts using [output_size+1], but I still couldn't achieve the desired outcome. What I meant is that the length of the entire string ( including the non-UTF-8 characters after the == ) equals to ((input_size + 2)/3 * 4).

Afterwards, I directly added an identical cpp file to the library source code, and the compiled, the executed result was correct. And then I found that when using the static lib local installed ( cmake .. && make install ), only the results from tb64senc are correct, as shown in the following image. And I added 'set(BUILD_SHARED_LIBS ON)' to the CMakeLists.txt file to get shared lib, then all the results were correct. ( This result was reproduced on two computers running Ubuntu os ) . My problem is resolved now, but I'm still confused. I'll do my best to provide you with the information I have

a45b371f8f59447da86f4c2e97168b6

Lemonononon avatar Nov 22 '23 01:11 Lemonononon

There are not separate functions for static and dynamic linking. Wondering why you're getting different sizes depending on the linking mode. Anyway the correct size is ((input_size + 2)/3 * 4), the base64 characters are all ascii and with the same utf-8 1 byte coding points. You can decode the base64 encoded and check against the original string.

powturbo avatar Nov 22 '23 18:11 powturbo