graph-rs icon indicating copy to clipboard operation
graph-rs copied to clipboard

InvalidData: stream did not contain valid UTF-8

Open samvv opened this issue 1 year ago • 10 comments

Describe the bug

I think this is a bug though I'm not entirely sure.

I get:

Custom {
    kind: InvalidData,
    error: "stream did not contain valid UTF-8",
}

To Reproduce

Taken from the examples:

let res = self
    .graph
    .drive(DRIVE_ID)
    .item_by_path(PATH_TO_DRIVE_ITEM)
    .update_items_content(BodyRead::from_async_read(reader).await?)
    .send()

reader is a variable that implements tokio::io::AsyncRead and may be used to read anything, not just Unicode.

Expected behavior

No error. A binary file should successfully upload.

Desktop:

  • SDK version: v2.0.0-beta.0
  • OS: Windows
  • Rust: rustc 1.77.0-nightly
  • Editor: VSCode

Additional context

Still working on the same file synchronization service. It's almost finished.

samvv avatar Feb 06 '24 16:02 samvv

Describe the bug

I think this is a bug though I'm not entirely sure.

I get:

Custom {
    kind: InvalidData,
    error: "stream did not contain valid UTF-8",
}

To Reproduce

Taken from the examples:

let res = self
    .graph
    .drive(DRIVE_ID)
    .item_by_path(PATH_TO_DRIVE_ITEM)
    .update_items_content(BodyRead::from_async_read(reader).await?)
    .send()

reader is a variable that implements tokio::io::AsyncRead and may be used to read anything, not just Unicode.

Expected behavior

No error. A binary file should successfully upload.

Desktop:

* SDK version: v2.0.0-beta.0

* OS: Windows

* Rust: rustc 1.77.0-nightly

* Editor: VSCode

Additional context

Still working on the same file synchronization service. It's almost finished.

Im guessing this is because the Content-Type header is not set for either application/octet-stream or it may even work with text/plain.

Have you tried changing the Content-Type header? If you do try it can you let me know if this work? Not saying its something you should or shouldn't have to do but it would help debug exactly whats happening in your situation.

sreeise avatar Feb 20 '24 05:02 sreeise

I will try it, thanks.

samvv avatar Feb 20 '24 10:02 samvv

Small update: I've made the change to the source code, but I've yet to test it out. I'll keep you posted.

samvv avatar Mar 05 '24 15:03 samvv

Nope it keeps failing. Tried both text/plain and application/octet-stream.

samvv avatar Mar 29 '24 13:03 samvv

Nope it keeps failing. Tried both text/plain and application/octet-stream.

Ok, thanks for reporting. I'll keep this open as a bug.

sreeise avatar May 23 '24 07:05 sreeise

Came across the same issue today.

Investigating this I found that the most likely culprit is how the file is read (still new to rust)

Internally all BodyRead impls seem to read the whole file content into a string. As strings in rust are only allowed to contain UTF-8 content this will fail horribly if you try to upload a binary file. Changing the content of my file to a UTF-8 string allowed me to get a response from the server instead of a preflight error.

billykater avatar May 28 '24 13:05 billykater

Oh thanks for investigating this! I'm wondering if the binary data is then still valid? Uploading binary as UTF-8, even with the right header, does not seem valid.

samvv avatar May 28 '24 13:05 samvv

Sorry for being unclear in my response: I just tried to upload a file containing a test string "test" instead of my actual content to validate my suspicion. Of course you simply can't convert any random [u8] to a valid string (hence the error we are both experiencing).

I am still digging to see if there is an option to bypass the BodyRead struct to upload binary files.

billykater avatar May 28 '24 13:05 billykater

I am achieving something similar using an upload_session:

pub async fn upload_xlsx_async(access_token: &str, local_file_path: &str, remote_file_path: &str) -> GraphResult<()> {
    let client = Graph::new(access_token);
    println!("Uploading file {} to {}", local_file_path, remote_file_path);
    // Get site id
    let try_group = client.me().followed_sites().list_followed_sites().send().await?;
    let grp: serde_json::Value = try_group.json().await?;
    let site_id = get_site_id_by_name(grp, "automation".to_string()).unwrap();

    // JSON specifying the conflict behavior when uploading, more details:
    // https://learn.microsoft.com/en-us/graph/api/resources/driveitem?view=graph-rest-1.0
    let upload = serde_json::json!({
        "@microsoft.graph.conflictBehavior": Some("rename".to_string())
    });

    let response = client
        .site(&site_id)
        .drive()
        .item_by_path(format!(":{}:", remote_file_path))
        .create_upload_session(&upload)
        .send()
        .await?;

    let file = tokio::fs::File::open(local_file_path).await?;

    let mut iter = response.into_upload_session_async_read(file).await?;
    while let Some(result) = iter.next().await {
        let response = result?;
        println!("{response:#?}");  // Print the response for each chunk uploaded
    }

    Ok(())
}

And then called:

use anyhow::Result;
use graph_rs_sdk::Graph;
use lib_o365::utils::file_upload::upload_xlsx_async;

#[tokio::main]
async fn main() -> Result<()> {
    env_logger::init();

    let access_token = lib_o365::authentication::log_me_in().await.unwrap();
    let client = Graph::new(&access_token);
    
    let f_name = "tst.pdf";
    let remote_path = format!("/Data categorisation/Trading figures/{}", f_name);
    upload_xlsx_async(
        &access_token,
        format!(".temp_files/{}",f_name).as_str(),
        &remote_path)
        .await?;
    Ok(())
}

For more of an update approach, @microsoft.graph.conflictBehavior can be set to replace.

There might be differences in terms of efficiency but an async upload does the trick for my use case.

buhaytza2005 avatar May 28 '24 14:05 buhaytza2005

@samvv @buhaytza2005 @billykater Thanks yall for continuing to look into this.

For context, the upload sessions using multipart upload have to know the size of the content before hand because each request must specify what range of bytes is being sent out of the total size of the content which is why all of the data is read in.

Just off the top of my head I do wonder how would binary files be uploaded via an http api if they cannot be converted to valid UTF-8 because that is actually required for http requests. But there probably is some solution to this that i'm just not aware of or forgot about.

sreeise avatar May 31 '24 09:05 sreeise

Came across the same issue today.

Investigating this I found that the most likely culprit is how the file is read (still new to rust)

Internally all BodyRead impls seem to read the whole file content into a string. As strings in rust are only allowed to contain UTF-8 content this will fail horribly if you try to upload a binary file. Changing the content of my file to a UTF-8 string allowed me to get a response from the server instead of a preflight error.

I should have looked closer at this. You are right. https://github.com/sreeise/graph-rs-sdk/blob/7aa756af45fafec43156d798fff1c47c3fb04b9b/graph-http/src/core/body_read.rs#L30-L34

The impl needs to change to handle this properly.

I don't know why I was thinking http required to be UTF-8. It does not.

sreeise avatar Aug 01 '24 21:08 sreeise

PR in progress. Still need to do some testing. The PR changes what readers and other byte types are converted to and removes the string conversion of course. Each of these types will be read into a Vec<u8> and then passed to the reqwest::Body or reqwest::blocking::Body. Its a quick solution for now and we can optimize later.

@billykater @samvv @buhaytza2005 Apologies that I didn't read through this well enough to understand what the issue was originally. This is completely valid. Hopefully this will help.

If any of you get a chance to try things out and test a bit I would appreciate the feedback. Here is the branch: https://github.com/sreeise/graph-rs-sdk/tree/465-fix

sreeise avatar Aug 05 '24 03:08 sreeise

Thank you @billykater, @samvv, and @buhaytza2005 for continuing to report on this issue and for finding out what was causing it.

I have merged a fix for this issue and added examples as well as updated the README.

sreeise avatar Aug 11 '24 05:08 sreeise

Updated in crate version 2.0.1 https://crates.io/crates/graph-rs-sdk/2.0.1

sreeise avatar Aug 11 '24 06:08 sreeise