jiffy icon indicating copy to clipboard operation
jiffy copied to clipboard

encode segmentation problem

Open qqdown opened this issue 2 years ago • 3 comments

When the data size is large enough, the encoded result will be segmented.

Small size data:

jiffy:encode(lists:seq(1, 50)).

Result:

<<"[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,4"...>>

Large size data:

jiffy:encode(lists:seq(1, 5000)).

Result:

[<<"[1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,"...>>,
 <<"532,533,534,535,536,537,538,539,540,541,542,543,544,545,546,547,548,549,550,551,552,553,554,555,556,557,558,"...>>,
 <<"1029,1030,1031,1032,1033,1034,1035,1036,1037,1038,1039,1040,1041,1042,1043,1044,1045,1046,1047,1048,1049"...>>,
 <<"1433,1434,1435,1436,1437,1438,1439,1440,1441,1442,1443,1444,1445,1446,1447,1448,1449,1450,1451,1452,"...>>,
 <<"1837,1838,1839,1840,1841,1842,1843,1844,1845,1846,1847,1848,1849,1850,1851,1852,1853,1854,1855,1"...>>,
 <<"2241,2242,2243,2244,2245,2246,2247,2248,2249,2250,2251,2252,2253,2254,2255,2256,2257,2258,22"...>>,
 <<"2645,2646,2647,2648,2649,2650,2651,2652,2653,2654,2655,2656,2657,2658,2659,2660,2661,266"...>>,
 <<"3049,3050,3051,3052,3053,3054,3055,3056,3057,3058,3059,3060,3061,3062,3063,3064,3065"...>>,
 <<"3453,3454,3455,3456,3457,3458,3459,3460,3461,3462,3463,3464,3465,3466,3467,3468,"...>>,
 <<"3857,3858,3859,3860,3861,3862,3863,3864,3865,3866,3867,3868,3869,3870,3871,3"...>>,
 <<"4261,4262,4263,4264,4265,4266,4267,4268,4269,4270,4271,4272,4273,4274,42"...>>,
 <<"4665,4666,4667,4668,4669,4670,4671,4672,4673,4674,4675,4676,4677,467"...>>]

The segmented list will cause a problem when the encoded result is used as a post data in httpc:request. The content length of encoded data becomes the length of the list, not the real data size.

Data = jiffy:encode(LargeData),
% this request fails due to wrong content-length
httpc:request(post, {Url, [], "application/json;charset=UTF-8", Data}, [{timeout, Timeout}], [])

And I have to use iolist_to_binary to solve it.

Data1 = iolist_to_binary(Data).
httpc:request(post, {Url, [], "application/json;charset=UTF-8", Data1}, [{timeout, Timeout}], [])

So why there is a default segmentation in encode result?

Should I use the encoded data in this way?

qqdown avatar Jun 17 '22 08:06 qqdown

There are two reasons for this behavior. First, returning an iolist() from the encoder allows jiffy to avoid having to copy memory between buffers during buffer expansion when encoding large amounts of JSON. The reason this is a useful return type in Erlang is that most IO routines will use the writev() system call when sending data. Given that a hefty percentage of JSON encoding is immediately followed by either writing it to disk or sending it across the network, I decided to use this approach rather than forcing all users to suffer the cost of performing those needless memory copies.

If you need to have a single resulting binary then the approach you've taken is correct. Just wrap the encoder result with a call to iolist_to_binary(). Most folks tend to create a pair of utility functions that wrap Jiffy so that every call uses the same decoder and encoder options along with this iolist_to_binary().

And just in case you're about to ask if I'll add an encoder option to return the binary, I've so far resisted that urge. My fear being that it would be an annoying source of bugs in production. If I force users to encounter it as early in development as possible then it won't sneak into production if someone forgets that option.

davisp avatar Jun 24 '22 21:06 davisp

The following PR fixes this: https://github.com/erlang/otp/pull/6181

aboroska avatar Aug 01 '22 13:08 aboroska

Got it! Thanks for the reply.

qqdown avatar Aug 08 '22 03:08 qqdown