v icon indicating copy to clipboard operation
v copied to clipboard

Lengths and indexes should use isize or usize

Open jrfondren opened this issue 2 years ago • 2 comments

V full version: V 0.3.0 b01f71d.8c33a40 Processor: 8 cpus, 64bit, little endian, Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz

What did you do?

Try to serve a 1.9G file in #15376

What did you expect to see?

A successfully downloaded file, or, at least, an error.

What did you see instead?

Each request caused a thread to spin at 100% CPU forever, never returning the file.

This was traced to a string interpolation of the response body, but that of course shouldn't fail. In investigating that, with help from #help, I came up with this test case:

import math

fn main() {
	unsafe {
		mut p := malloc(math.max_u32)
		vmemset(p, 1, math.max_u32 - 1)
		p[math.max_u32 - 1] = 0
		println('before crash:')
		s := cstring_to_vstring(&p[0])
		println('after crash')
		println(s.len)
	}
}

Which crashes at the cstring_to_vstring, with this output:

before crash:
V panic: malloc_noscan(-1 <= 0)
v hash: 8c33a40
/tmp/v_1000/strint.9715218836963737045.tmp.c:6967: at _v_panic: Backtrace
/tmp/v_1000/strint.9715218836963737045.tmp.c:7203: by malloc_noscan
/tmp/v_1000/strint.9715218836963737045.tmp.c:9892: by string_clone
/tmp/v_1000/strint.9715218836963737045.tmp.c:9782: by cstring_to_vstring
/tmp/v_1000/strint.9715218836963737045.tmp.c:14882: by main__main
/tmp/v_1000/strint.9715218836963737045.tmp.c:15353: by main

The problem seems to be that vstrlen(), tos(), the string type itself, and many other places use 32-bit int as the length of a string, and math.max_u32 is the same number as int(-1). 2GB also is 0x8000_0000, which is equal to math.min_i32. I imagine that the actual bug in the string interpolation case was due to the 1.9GB file's size being close enough to the limit it flipped signs in an expected place.

32-bit platforms don't have the address space for a 2GB string, but lots of libc and POSIX functions deal in C's size_t type, which is 64-bits on a 64-bit platform.

jrfondren avatar Aug 08 '22 01:08 jrfondren

I'd say usize if a variable sized type is needed. Unless someone has a very good reason why you should be able to allocate negative amounts of memory...

JalonSolov avatar Aug 08 '22 02:08 JalonSolov

Unsigned seems to be the most popular, but in go the type is int (where it's variable-sized, with u32 et al. for specific sizes), which is signed. I'm not aware of a justification but I imagine it's

  1. surely nobody will ever need the 64th bit of an unsigned number. It's pretty big, at 18446744073709551615, or just under 16 exobytes
  2. that it's signed saves users from easy unsigned math errors, like this infinite loop:
import math
fn main() {
	mut i := u16(math.max_u16)
	for i >= 0 {
		println(i)
		i--
	}
}

Some discussion: https://stackoverflow.com/questions/39088945/why-does-len-returned-a-signed-value

jrfondren avatar Aug 08 '22 03:08 jrfondren

JFYI, Go changed its default integer size on 64-bit platforms from 32 to 64 bits back in 2013 one year after its official 1.0 release...

https://go.dev/doc/go1.1

markus-oberhumer-forks avatar Aug 10 '22 13:08 markus-oberhumer-forks

https://stackoverflow.com/questions/39088945/why-does-len-returned-a-signed-value

medvednikov avatar Aug 11 '22 13:08 medvednikov

@medvednikov the problem isn't that it returns a signed value. That's what I prefer. The problem is that it's 32-bit so V cannot support slices of 2G or larger.

jrfondren avatar Aug 11 '22 23:08 jrfondren

I see, and in Go it's solved by int being an i64 on 64 bit systems.

medvednikov avatar Aug 12 '22 14:08 medvednikov

@medvednikov the problem isn't that it returns a signed value. That's what I prefer. The problem is that it's 32-bit so V cannot support slices of 2G or larger.

I'm not sure you should have the 1.9G file in memory?

emily33901 avatar Aug 13 '22 11:08 emily33901

@emily33901 the particular example is obviously avoidable.

jrfondren avatar Aug 13 '22 16:08 jrfondren

Until V gets a way to stream files to disk, there's no real way to avoid that 1.9G file in memory.

Besides, 1.9G file in memory is no big deal for those of us with plenty of RAM.

JalonSolov avatar Aug 13 '22 18:08 JalonSolov

struct KB { bytes [1024]u8 }
big := []KB{len: 5 * 1024 * 1024} // big should be now 5GB
dump(u64(big.len) * u64(big.element_size))

produces:

[s.v:6] u64(big.len) * u64(big.element_size): 5368709120

i.e. imho the problem is more of convenience of operation, and that many APIs in vlib expect []u8, and that has .element_size of 1 .

spytheman avatar Aug 17 '22 05:08 spytheman

int => i64 on 64 bit, i32 on 32 bit

Looks like there's movement towards fixing this with 0.4.3, but the example still crashes as the string type is still using C's int which is 32-bit on x86_64. Also, println(sizeof(int)) still prints 4 for me.

jrfondren avatar Nov 12 '23 22:11 jrfondren

Yeah I did the first part. I still need to finish the second part.

Such big transitions have to be done gradually.

medvednikov avatar Nov 12 '23 22:11 medvednikov