surf
surf copied to clipboard
Bug: Response::body_json could not convert from an UTF-8 BOM source file
Target
body_jsonfunction ofResponsestruct
Detail
body_jsonfunction could not convert from an UTF-8 BOM source file
Repro
- Prepare an
UTF-8 BOMJSON file.; BOM bytes is0xef 0xbb 0xbf. - Deploy the JSON file and run a httpd.
surf::getandbody_json=>SerdeJsonError(Error("expected value", line: 1, column: 1))
Workaround
// It will be the error if response has the BOM.
// let my_struct = my_response.body_json::<MyStruct>();
- Raw implementation:
let my_response_body_string = my_response.body_string();
let my_response_body_string = if my_response_body_string.starts_with("\u{feff}") { &my_response_body_string[3..] } else { &my_response_body_string[..] };
let my_struct: MyStruct = serde_json::from_str(my_response_body_string);
- Or use
strip_bom:
use strip_bom::StripBom;
let my_response_body_string = my_response.body_string();
let my_response_body_string = my_response_body_string.strip_bom();
let my_struct: MyStruct = serde_json::from_str(my_response_body_string);
Notes:
- This error is
SerdeJsonError, but a character encoding issue might not issue ofserde_jsonorserdecrate. Because,serde_jsonis designed for UTF-8 stream body only, not for a file directly or BOM. Thus, I think the BOM issue is not aserde_jsonorserdeissue, it is a library user side issue. If you authors think it is not a surf issue, it is a serde_json or serde issue then I'll throw the issue to serde_json or serde repos.serde_json::from_reader(for an IO stream, ≈ Input UTF-8 character stream)serde_json::from_str(≈Input UTF-8 character stream).
- It might be not a
std::string::Stringissue because https://github.com/rust-lang/rfcs/issues/2428 - JSON (ECMA-404, RFC8259) is defined the character encoding to use only UTF-8, but the BOM is not defined. Thus, JSON file with UTF-8 BOM is not an invalid in specification. Yes, of course I was think nobody who use UTF-8 BOM to JSON file before today, but a Japanese government institution used it. 🥺
Is this still true in the 2.0 alphas? 2.0.0-alpha.0 and later include support for non-UTF8 encodings, but that code should also handle UTF-with-BOM.
@goto-bus-stop Unfortunately yes, this problem occur in these any versions:
surf = "1.0.3"surf = {git = "https://github.com/http-rs/surf", tag = "v2.0.0-alpha.0"}surf = {git = "https://github.com/http-rs/surf", tag = "v2.0.0-alpha.1"}surf = {git = "https://github.com/http-rs/surf", tag = "v2.0.0-alpha.2"}surf = {git = "https://github.com/http-rs/surf", tag = "v2.0.0-alpha.3"}surf = {git = "https://github.com/http-rs/surf", tag = "v2.0.0-alpha.4"}surf = {git = "https://github.com/http-rs/surf", tag = "v2.0.0-alpha.5"}
Example repro code
main.rs
use serde::{Deserialize, Serialize};
#[derive(Debug, Serialize, Deserialize)]
struct Sushi {
sushi: String,
}
fn main() {
// UTF-8N; `{ "sushi": "🍣" }`
let mut r = smol::block_on(surf::get("http://localhost:50000/n-sushi.txt")).unwrap();
let sushi = smol::block_on(r.body_json::<Sushi>()).unwrap();
println!("n{:?}", &sushi);
// UTF-8 BOM; `\u{feff}{ "sushi": "🍣" }`
let mut r = smol::block_on(surf::get("http://localhost:50000/bom-sushi.txt")).unwrap();
let sushi = smol::block_on(r.body_json::<Sushi>()).unwrap();
println!("b{:?}", &sushi);
}
[dependencies]section of Cargo.toml:
[dependencies]
serde = {version = "1.0.115", features = ["derive"]}
serde_json = "1.0.57"
smol = "1.0.0"
surf = {git = "https://github.com/http-rs/surf", tag = "v2.0.0-alpha.5"}
#surf ="1.0.3"
Example UTF-8N and UTF-8 BOM files
hexdump --canonical n-sushi.txt# It's UTF-8N(Reference)
00000000 7b 20 22 73 75 73 68 69 22 3a 20 22 f0 9f 8d a3 |{ "sushi": "....|
00000010 22 20 7d |" }|
00000013
hexdump --canonical bom-sushi.txt# It's UTF-8 BOM (Target of this test)
00000000 ef bb bf 7b 20 22 73 75 73 68 69 22 3a 20 22 f0 |...{ "sushi": ".|
00000010 9f 8d a3 22 20 7d |..." }|
00000016
Repro procedure
simple-http-server -p 50000cargo run
And then I got the result:
// cargo process
Finished dev [unoptimized + debuginfo] target(s) in 0.41s
Running `target\debug\se.exe`
// The result part of the first UTF-8N reference
nSushi { sushi: "🍣" }
// The result part of the second UTF-8 BOM testing
thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: expected value at line 1 column 1', src\main.rs:15:56
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
error: process didn't exit successfully: `target\debug\se.exe` (exit code: 101)