neatvi icon indicating copy to clipboard operation
neatvi copied to clipboard

Why ‘int’ instead of ‘size_t’

Open cesss opened this issue 6 years ago • 5 comments

I know neatvi aims to be small, but why did you choose ‘int’ variables for buffer sizes and cursor positions, instead of ‘size_t’ ? Even if neatvi is small, using ‘size_t’ variables would make it possible to open huge files.

BTW, can cursor positions (or sizes) become negative? If they can never be negative, maybe it’s straightforward to just replace the proper ‘int’ variables by ‘size_t’ ones.

cesss avatar May 29 '19 21:05 cesss

cesss [email protected] wrote:

I know neatvi aims to be small, but why did you choose ‘int’ variables for buffer sizes and cursor positions, instead of ‘size t’ ? Even if neatvi is small, using ‘size t’ variables would make it possible to open huge files.

That is for simplicity, indeed. There are many places in Neatvi in which size_t can be used: the number of characters in a line, the number of lines, buffers passed to functions for syntax highlighting or rendering, indices for referencing them, column and row numbers, even command prefixes, and ... . Using size_t properly and conversion between size_t and int requires some care and patience and would making modifying the code more difficult. Therefore, at least in Neatvi, using size_t does not seem to worth the effort.

About the size of the files that Neatvi can open, the size of int on most architectures seems large enough and, admittedly, Neatvi is not the best choice for editing files beyond a few megabytes. When that is a major concern, I think using long is easier than using size_t.

BTW, can cursor positions (or sizes) become negative? If they can never be negative, maybe it’s straightforward to just replace the proper ‘int’ variables by ‘size t’ ones.

I think the main advantage of types like size_t or the const keyword is for documentation in an API. Using size_t in Neatvi introduces new concerns about when and how to use size_t properly and how to mix it with integers; too much work it seems...

Best wishes, Ali

aligrudi avatar May 31 '19 08:05 aligrudi

I believe the real point here is not 'size_t' vs 'int', but 'unsigned' vs 'signed'. For example, if all your variables were unsigned, then using 'size_t' instead of 'int' is just a matter of a typedef, because 'size_t' is unsigned. Of course, there's also a signed version of 'size_t', called 'ptrdiff_t' IIRC.

Most of the times you edit small to moderate files. But there's always one day when you face a file over 4GB. And then, that day you realise you don't have a proper open-source editor for it, and the only option is to purchase a commercial editor (it happened to me).

So, I find it unfortunate that neatvi is not ready for using 64bit address spaces.

Do you know of any other vi implementation ready for >4GB files?

cesss avatar May 31 '19 13:05 cesss

cesss [email protected] wrote:

I believe the real point here is not 'size_t' vs 'int', but 'unsigned' vs 'signed'. For example, if all your variables were unsigned, then using 'size_t' instead of 'int' is just a matter of a

Note that, when using size_t you do not get negative indices or values when you expect positive ones, but the danger of integer overflow does still exist.

typedef, because 'size_t' is unsigned. Of course, there's also a signed version of 'size_t', called 'ptrdiff_t' IIRC.

There exists ssize_t too.

Most of the times you edit small to moderate files. But there's always one day when you face a file over 4GB. And then, that day you realise you don't have a proper open-source editor for it, and the only option is to purchase a commercial editor (it happened to me).

So, I find it unfortunate that neatvi is not ready for using 64bit address spaces.

As a hack, I suggest putting: #define int long #define short in vi.h and fixing the reported warnings to obtain a version which works for files larger than 2^31.

Do you know of any other vi implementation ready for >4GB files?

I never looked for it, but in an editor for large files, the whole contents of a file should not be brought in memory. Implementing this is more difficult, but not very much so.

Best wishes, Ali

aligrudi avatar May 31 '19 14:05 aligrudi

I think that rather than redefining 'int' as 'long', it does make more sense to use standard sized integers, because the meaning for 'short', 'long', 'long long', etc... can be totally different from platform to platform. If the platform you are compiling to doesn't have the (now standard) header for sized integers, you can always provide a fallback path in the Makefile for manually defining them for your platform.

For example, I think it would be really useful to be able to build neatvi being able to choose the buffer size in bits at build time (eg: choose to build neatvi for 16bit, 32bit, or 64bit sized buffers).

Regarding not bringing the whole file into memory when it's large, I agree if it's for avoiding an unnecessary write of several GBs into the disk just when you just modified an small fraction of the file. For example, SQLite doesn't overwrite the whole database when you save it, but just the bytes you changed. But if the reason is for avoiding pushing the RAM usage in your machine, I don't see it as an important reason (yes, it can be handy to be able to limit the RAM usage, but I see it more as a "bonus feature" than a required thing, while avoiding unnecessary disk writes is a requirement IMHO).

Note that you should be able to avoid unnecessary disk writes while having the whole file in RAM at the same time: if the editor has some means of tagging as "dirty" the parts of the file you modified, it should be able to rewrite just those parts and not the whole file.

cesss avatar Jun 01 '19 09:06 cesss

cesss [email protected] wrote:

I think that rather than redefining 'int' as 'long', it does make more sense to use standard sized integers, because the meaning for 'short', 'long', 'long long', etc... can be totally different from platform to platform. If the platform you are compiling doesn't have the (now standard) header for sized integers, you can always provide a fallback path in the Makefile for manually defining them for your platform.

That was just a hack (I wrote a small C library to bootstrap neat*, you know).

For example, I think it would be really useful to be able to build neatvi being able to choose the buffer size in bits at build time (eg: choose to build neatvi for 16bit, 32bit, or 64bit sized buffers).

The types defined in stdint.h are for that. However, for Neatvi that does not seem useful for many of its users.

Regarding not bringing the whole file into memory when it's large, I agree if it's for avoiding an unnecessary write of several GBs into the disk just when you just modified an small fraction of the file. For example, SQLite doesn't overwrite the whole database when you save it, but just the bytes you changed. But if the reason is for avoiding pushing the RAM usage in your machine, I don't see it as an important reason (yes, it can be handy to be able to limit the RAM usage, but I see it more as a "bonus feature" than a required thing, while avoiding unnecessary disk writes is a requirement IMHO).

Note that you should be able to avoid unnecessary disk writes while having the whole file in RAM at the same time: if the editor has some means of tagging as "dirty" the parts of the file you modified, it should be able to rewrite just those parts and not the whole file.

That helps too.

Ali

aligrudi avatar Jun 01 '19 11:06 aligrudi