v86 icon indicating copy to clipboard operation
v86 copied to clipboard

Bug - When creating an empty disk in the UI, the emulator seems to silently fail when the disk size is above 2046 MB

Open FunnyCorgi opened this issue 6 months ago • 6 comments

Hi, I found out earlier through some testing that attempting to use the custom setup UI on copy.sh/v86 that when the size of the disk is set to 2047 MB or higher, the emulator will silently fail to start. Checking ide.js I found: const buffer = new Uint8Array(byte_count); I think, but I can't be certain that because this is a Uint8Array that this is because the limit is reached for the size of the array because it can't be as large as 2047 * 1024 * 1024 items long, although 2046 * 1024 * 1024 works just fine. Some fixes for this I think might work would be to split it into an JSON object or array and have each block stored in it as a 2046 MB each, so we can have disk sizes of nearly any amount. I would be happy to contribute this in a PR if you would like, though I might not be able to work on one for about 1–2 weeks. I’ll probably only be able to make major contributions after, where I will have more time, but I’d still love to help out when I can. My environment: Chrome v133 v86 version: 0eec2965

FunnyCorgi avatar Jun 18 '25 18:06 FunnyCorgi

Hey, I was thinking about this too, here are the ideas I've collected so far, maybe they can help you with what you plan.

The IDE controller doesn't use Uint8Array directly, there are already a couple of v86 buffer types defined for that purpose to allow for more flexibility. In buffer.js you find these 5 different buffer types:

The first type SyncBuffer is the simplest one, it just wraps a single Uint8Array. The second one uses a network connection to load chunks of the image buffer on demand (as they're requested by the guest OS). Not sure about the third one (9p perhaps?), the 4th and 5th are like the first two, but operate on an underlying File object.

It's not defined explicitly in v86 (it's duck typed), but this is the shared set of methods that all of the buffer types must implement (you might call this the buffer type interface or abstract base class):

  • load(): void
    initialize this buffer
  • get(start: integer, len: integer, fn: function(Uint8Array)): void
    read len bytes starting at start from this buffer and pass the data to function fn when complete
  • set(start: integer, slice: Uint8Array, fn: function()): void
    write all bytes from slice to this buffer at offset start and call fn when complete
  • get_buffer(fn: function(Uint8Array)): void
    get the whole internal buffer (if applicable)
  • get_state(): Array
    set_state(Array state): void
    store and restore internal state for v86 state snapshot

See class SyncBuffer for a trivial implementation. This is just a very rough outline, I much recommend to study the code in buffer.js to understand the purpose of the functions listed above.

So I think your focus should be to define yet another buffer type, let's call it SlicedBuffer, which manages an arbitrary sized buffer using an internal list of Uint8Array slices. Your main concern is how to implement get() and set(), all you need to take care of is when the guest OS accesses a block of data that crosses the boundary of two (or more) adjacent slices. get_buffer() is not applicable in your case, and get/set_state() should be simple.

Note that there are no rules of any kind when allocation Uint8Array, allocation limits depend entirely on the operating system and browser. From my local experiments on Windows 10 64-Bit, Firefox limits to 2048M and Chrome to 4096M per Uint8Array. I have no idea about the limits in my Smartphone but expect them to be lower. Or Nodejs, which could be larger. So either you pick a small, fixed Uint8Array size for the slices, or measure upfront how much the browser is willing to allocate per Uint8Array.

Another issue is exporting and importing a SlicedBuffer, you want to export into/import from a single, large file.

EDIT: On a second thought, I'd recommend to extend the code in SyncBuffer to become what I defined as SlicedBuffer. SyncBuffer is just the special case of SlicedBuffer with a single slice. The number of slices depends on the image buffer size and the slice size, which are known. Details for v86 state snapshots and import/export still need to be further fleshed out here, but going about it this way would integrate this much better.

chschnell avatar Jun 19 '25 16:06 chschnell

Another issue is exporting and importing a SlicedBuffer, you want to export into/import from a single, large file.

Yeah, that is a problem, we have to store all the chunks/slices of the buffer in a single file. I was thinking when I read this that we could store it in some kind of object(JSON or an array probably) holding each slice, and then if we're trying to save a state file, we try to compress it at least a little(to avoid having an extremely large file size).

We also need to ensure the hard disk image can be properly exported. This should be relatively simple to implement: since we already have the export logic, we just need to expand it so it can easily create the file for a buffer with multiple slices.

EDIT: On a second thought, I'd recommend to extend the code in SyncBuffer to become what I defined as SlicedBuffer. SyncBuffer is just the special case of SlicedBuffer with a single slice. The number of slices depends on the image buffer size and the slice size, which are known.

This makes sense, we already have SyncBuffer, and by expanding it we do keep functionality without rewriting or changing the existing buffer initialization code. We just need to make sure we handle all the logic for having multiple slices properly, ideally with minimal changes outside the buffer implementation to support multiple slices.

FunnyCorgi avatar Jun 29 '25 14:06 FunnyCorgi

I now think that "large files" (larger than the max. buffer size) are harder than I thought in the beginning.

In order to import a large file from a local file or a remote URL it will need further modifications to v86, I believe it could work asynchronously in which case you receive the file content in chunks which you could then fit into your own memory chunks.

In order to export memory chunks into a large file I guess I'd try a similar strategy, but I really have no idea how to export asynchronously at the moment.

Local file I/O should perform ok with large files, but downloading an entire large image from a remote URL upfront can be very inefficient.

Well possible that there are more things to be considered here.

I don't want to discourage you, but I'm honestly not sure if I'd put in all the work this amounts to.

chschnell avatar Jun 30 '25 09:06 chschnell

Did you try setting async: true? AsyncXHRBuffer already represents the disk image as an array of chunks, and shouldn't be affected by this problem.

copy avatar Aug 12 '25 21:08 copy

@copy I'm not quite sure how to do that. Do I just set it up like this:

hda: {
         async: true,
         size: 1024 * 1024 * 2048,
     }, 

Or like this based on the following culprit code(not quite sure about this because the culprit code uses the URL queries but this might be close):

hda: {
         async: true,
         empty: 1024 * 1024 * 2048,
     }, 

EDIT: Looking into buffer.js I don't see an implementation for AsyncXHRBuffer for on-demand creation of an empty hard disk. I think that it currently just works for local files and also urls. If you think I should extend this for the creation of on-demand images at runtime in my PR, just let me know.

Also I think I found the culprit code for this bug:

    else if(query_args.has("hda.empty"))
            {
                const empty_size = parseInt(query_args.get("hda.empty"), 10);
                if(empty_size > 0)
                {
                    settings.hda = { buffer: new ArrayBuffer(empty_size) };
                }
            }

            ...(hdb url setup)
            else if(query_args.has("hdb.empty"))
            {
                const empty_size = parseInt(query_args.get("hdb.empty"), 10);
                if(empty_size > 0)
                {
                    settings.hdb = { buffer: new ArrayBuffer(empty_size) };
                }
            }

Here we are setting for both hard disks, if they are empty, to be ArrayBuffers. I think instead a fix to this issue would be to make it so we create a buffer - preferably extending SyncBuffer to act like a SlicedBuffer like @chschnell said earlier, or just extend AsyncXHRBuffer to support creating large disks at runtime that are composed of one or more chunks:

EDIT: On a second thought, I'd recommend to extend the code in SyncBuffer to become what I defined as SlicedBuffer. SyncBuffer is just the special case of SlicedBuffer with a single slice. The number of slices depends on the image buffer size and the slice size, which are known. Details for v86 state snapshots and import/export still need to be further fleshed out here, but going about it this way would integrate this much better.

If that’s the right direction, I can add support in buffer.js so that large runtime-created empty disks use a sliced/async buffer instead of allocating a giant ArrayBuffer.

FunnyCorgi avatar Aug 15 '25 18:08 FunnyCorgi

Right, AsyncXHRBuffer works for large hosted files, but not for new files.

In order to export memory chunks into a large file I guess I'd try a similar strategy, but I really have no idea how to export asynchronously at the moment.

This is implemented here for local files: https://github.com/copy/v86/blob/51bf5a63dcc34f806cdc3c182c383d770e76e0cf/src/buffer.js#L650 For large remote files, I don't plan to implement full downloads.

If you think I should extend this for the creation of on-demand images at runtime in my PR, just let me know.

No, this doesn't really belong into AsyncXHRBuffer.

If that’s the right direction, I can add support in buffer.js so that large runtime-created empty disks use a sliced/async buffer instead of allocating a giant ArrayBuffer.

SlicedBuffer sounds reasonable, but I would first implement a sparse version and see if that works for you: Basically like AsyncFileBuffer but without the file. Use the existing get_from_cache, set, handle_read etc. If the non-cached part of get is hit, create a Uint8Array of zeroes on the fly. To implement get_as_file, see above but again create the empty parts of the file on demand.

copy avatar Aug 15 '25 19:08 copy