int (de)serialization wonky around size = 0, doing malloc(0) and memcpy with size 0
While working on the run-time reconfigurable malloc, I noticed that the int (de)serialization does a 0-byte malloc and a memcpy with size 0. Is this really correct? Feels quite bogus?
I don't know our int (bsdnt) format well enough to say when size is 0 but I looked at the deserialization debug output and we have these:
--- 41: class_id 2, blob_size: 1, blob: 0 .
class_id = 2, and blob[0] is 0, which is the size of the int. What does that mean? Is that valid? Isn't this like 0-counting, so size=0 means we only have one limb (we always have one limb, right?) and so there should really be an offset to the deserialization malloc of val.n, no? see comment inline below
B_int B_intD___deserialize__(B_int res,$Serial$state state) {
$ROW this = state->row;
state->row = this->next;
state->row_no++;
if (this->class_id < 0) {
return (B_int)B_dictD_get(state->done,(B_Hashable)B_HashableD_intG_witness,to$int((long)this->blob[0]),NULL);
} else {
if (!res)
res = malloc_int();
res->val.size = (long)this->blob[0];
res->val.alloc = labs(res->val.size);
res->val.n = acton_malloc(res->val.alloc*sizeof(long)); // When val.alloc is 0, we do a malloc(0) - what does that even mean?
memcpy(res->val.n,&this->blob[1],res->val.alloc*sizeof(long)); // why copy of size 0?
B_dictD_setitem(state->done,(B_Hashable)B_HashableD_intG_witness,to$int(state->row_no-1),res);
res->$class = &B_intG_methods;
return res;
}
}
In malloc_int we always allocate val.n to be sizeof(unsigned long):
B_int malloc_int() {
B_int res = acton_malloc(sizeof(struct B_int));
res->$class = &B_intG_methods;
res->val.n = acton_malloc_atomic(sizeof(unsigned long));
res->val.size = 0;
res->val.alloc = 1;
return res;
}
wouldn't that be the equivalent of adding +1 in the deserialize, like so:
res->val.n = acton_malloc((res->val.alloc+1)*sizeof(long));
@sydow please advice :)
A B_int is a pointer to a struct with three fields:
-
n, which again is a pointer, to a (heap-allocated) array of 64-bit unsigned “digits” or limbs.
-
size, which is a signed long and has two roles:
- it describes the sign of the number; if size=0, the number is 0 (regardless of limbs), if it is positive(negative) the same holds for the number.
- if size is non-zero, its absolute value is the number of significant limbs in n.
-
alloc, which is the number of allocated limbs, which may be greater than the significant number in size.
Consequently, if the number is zero, we could have n be a null pointer, but we always allocate one limb , for simplicity and for avoiding some checks.
A correct number must have the last (most) significant non-zero. Additional limbs are allowed for several reasons, among them that the library allows for mutable bigints(!). This can be space-saving in computations with huge numbers. But we don’t use it; to let
b += 1234567876543234567
change b is a bit too much…
Björn
On 12 Feb 2024, at 13:18, Kristian Larsson @.***> wrote:
While working on the run-time reconfigurable malloc, I noticed that the int (de)serialization does a 0-byte malloc and a memcpy with size 0. Is this really correct? Feels quite bogus? I don't know our int (bsdnt) format well enough to say when size is 0 but I looked at the deserialization debug output and we have these: --- 41: class_id 2, blob_size: 1, blob: 0 .
class_id = 2, and blob[0] is 0, which is the size of the int. What does that mean? Is that valid? Isn't this like 0-counting, so size=0 means we only have one limb (we always have one limb, right?) and so there should really be an offset to the deserialization malloc of val.n, no? see comment inline below B_int B_intD___deserialize__(B_int res,$Serial$state state) { $ROW this = state->row; state->row = this->next; state->row_no++; if (this->class_id < 0) { return (B_int)B_dictD_get(state->done,(B_Hashable)B_HashableD_intG_witness,to$int((long)this->blob[0]),NULL); } else { if (!res) res = malloc_int(); res->val.size = (long)this->blob[0]; res->val.alloc = labs(res->val.size); res->val.n = acton_malloc(res->val.allocsizeof(long)); // When val.alloc is 0, we do a malloc(0) - what does that even mean? memcpy(res->val.n,&this->blob[1],res->val.allocsizeof(long)); // why copy of size 0? B_dictD_setitem(state->done,(B_Hashable)B_HashableD_intG_witness,to$int(state->row_no-1),res); res->$class = &B_intG_methods; return res; } } In malloc_int we always allocate val.n to be sizeof(unsigned long): B_int malloc_int() { B_int res = acton_malloc(sizeof(struct B_int)); res->$class = &B_intG_methods; res->val.n = acton_malloc_atomic(sizeof(unsigned long)); res->val.size = 0; res->val.alloc = 1; return res; } wouldn't that be the equivalent of adding +1 in the deserialize, like so: res->val.n = acton_malloc((res->val.alloc+1)sizeof(long)); @sydow please advice :) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: @.**>
Right, so for simplicity we always allocate a limb and I see that is what we do in malloc_int() but that is not what we do in B_intD___deserialize__(). When size is 0 we will do malloc(0)! See this line:
res->val.n = acton_malloc(res->val.alloc*sizeof(long));
I reckon it should be something like this:
res->val.n = acton_malloc(max(1, res->val.alloc*sizeof(long)));
right?
That way we avoid malloc(0) which is UB. malloc on Linux accepts size 0 but that is a proprietary behavior. The man page and information on the Internet says that malloc(0) should be avoided because it is not standardized behavior.
@sydow assigning you, I think this is a simple fix, would just be nice if you, who understand this much better than I, can check the code and do the right thing :)