studio icon indicating copy to clipboard operation
studio copied to clipboard

Total and free size of filesystem reported in sites doesn't match actual values

Open fluiddot opened this issue 1 year ago • 8 comments

Quick summary

The total and free size of filesystem reported in sites doesn't match the ones from the machine. This can lead to issues in plugins that rely on these values.

Steps to reproduce

  1. Open the app.
  2. Create/start a site.
  3. Edit the wp-config-php file of the site and add the following code at the bottom:
define( 'WP_DEBUG', true );
define( 'WP_DEBUG_LOG', true );
error_log("Disk Free Space: " . print_r(disk_free_space('/'), true));
error_log("Disk Total Space: " . print_r(disk_total_space('/'), true));
  1. Navigate to the site.
  2. Open the logs file located in <SITE_PATH>/wp-content/debug.log.
  3. Observe that disk space values don't match the ones from the machine. E.g.:
    • Disk Free Space: 2048000000 (2GB)
    • Disk Total Space: 4096000000 (4GB)

What you expected to happen

The size of filesystem matches the values of the machine.

What actually happened

The size of the filesystem doesn't match the values of the machine.

Impact

All

Available workarounds?

No but the app is still usable

Platform

Mac Silicon

Logs or notes

No response

fluiddot avatar Jul 04 '24 15:07 fluiddot

I explored the Playground code and noticed that the free and total values are fixed values. Let me briefly explain the references I found while debugging this:

Free space

  • PHP's function disk_free_space calculates the value using the filesystem function statfs:
    • https://github.com/php/php-src/blob/c69dedabb23e7c3c212f31a00251536da0ab61bf/ext/standard/filestat.c#L192
    • https://github.com/php/php-src/blob/c69dedabb23e7c3c212f31a00251536da0ab61bf/ext/standard/filestat.c#L234-L238
  • The calculation isf_bsize * f_bavail, which values represent Optimal transfer block size and Free blocks available to unprivileged user (documentation).

Total space

  • PHP's function disk_total_space calculates the value using the filesystem function statfs:
    • https://github.com/php/php-src/blob/c69dedabb23e7c3c212f31a00251536da0ab61bf/ext/standard/filestat.c#L107
    • https://github.com/php/php-src/blob/c69dedabb23e7c3c212f31a00251536da0ab61bf/ext/standard/filestat.c#L152-L156
  • The calculation isf_bsize * f_blocks, which values represent Optimal transfer block size and Total data blocks in filesystem (documentation).

Filesystem values

  • The values used in the above calculations are obtained from NODEFS.statfs function which returns value is represented with the class statfs.
  • The implementation of this function can be seen here for the PHP version 8.1:
HEAP32[(((buf) + (4)) >> 2)] = 4096;
HEAP32[(((buf) + (40)) >> 2)] = 4096;   // f_bsize - Block size
HEAP32[(((buf) + (8)) >> 2)] = 1e6;     // f_blocks - Total blocks
HEAP32[(((buf) + (12)) >> 2)] = 5e5;    // f_bfree - Free blocks (not used in calculation)
HEAP32[(((buf) + (16)) >> 2)] = 5e5;    // f_bavail - Available blocks
HEAP32[(((buf) + (20)) >> 2)] = FS.nextInode;
HEAP32[(((buf) + (24)) >> 2)] = 1e6;
HEAP32[(((buf) + (28)) >> 2)] = 42;
HEAP32[(((buf) + (44)) >> 2)] = 2;
HEAP32[(((buf) + (36)) >> 2)] = 255;
  • As you can see, the values are fixed so as outlined in the issue, they don't reflect the actual total and free size of the file system. Here are the results:
    • Total available size: 4096 * 1^6 = 4.096.000.000 (4GB).
    • Total free size: 4096 * 5^5 = 2.048.000.000 (2GB).

~~As a next step, we'd need to explore if this is set by Playground when compiling PHP source. Or if on the other hand, this is set by Emscripten.~~

UPDATE: As shared in https://github.com/Automattic/studio/issues/341#issuecomment-2303555457, the values are set by Emscripten here.

fluiddot avatar Jul 04 '24 17:07 fluiddot

CC @adamziel

wojtekn avatar Jul 08 '24 08:07 wojtekn

@fluiddot I went a bit deeper and it looks like these values are set in emscripten:

https://github.com/emscripten-core/emscripten/blob/7f8a05dd4e37cbd7ffde6d624f91fd545f7b52e3/src/library_syscall.js#L804 https://github.com/emscripten-core/emscripten/blob/7f8a05dd4e37cbd7ffde6d624f91fd545f7b52e3/system/lib/wasmfs/syscalls.cpp#L1530

With the various types of available VFS's it could be complicated to give a meaningful output of statfs which is probably why they settled on a stub with, as they call it, sane values.

@adamziel Do you have any ideas of how we can proceed with this? I'm happy to have a deeper look but I'm out of my depth here so some pointers would be very welcome :-)

jeroenpf avatar Aug 22 '24 02:08 jeroenpf

It seems like it's reporting the heap size, not the disk size. The 2GB comes up on this Emscripten doc page but not directly tied with this problem.

The original commit introducing these constants in Emscripten says:

// None of the constants here are true. We're just returning safe and // sane values.

It sounds like this could be solved with an Emscripten PR that delegates the statfs call to the FS object which in turn delegates it to the underlying FS backend like NODEFS and falls back to the hardcoded constants if that backend doesn't support the statfs call. Opening an Emscripten issue first would be useful to validate the approach with the team.

adamziel avatar Sep 11 '24 12:09 adamziel

This approach could work. Some filesystems might not be relevant because they only work in the browsers. For NODEFS this will work and I was able to test that with some custom code:

  function ___syscall_statfs64(path, size, buf) {

      let defaults = {
      	bsize: 4096,
      	frsize: 4096,
      	blocks: 1e6,
      	bfree: 5e5,
      	bavail: 5e5,
      	files: FS.nextInode,
      	ffree: 1e6,
      	fsid: 42,
      	flags: 2,
      	namelen: 255
      }

    try {
	    path = SYSCALLS.getStr(path);

	    if (ENVIRONMENT_IS_NODE ) { // need to check here if FS is nodefs too.
		    const node = FS.lookupPath(SYSCALLS.getStr(path), { follow: true }).node;
		    const stats = fs.statfsSync(node.mount.opts.root);
		    defaults.bsize = stats.bsize;
		    defaults.frsize = stats.bsize;
		    defaults.blocks = stats.blocks;
		    defaults.bfree = stats.bfree;
		    defaults.bavail = stats.bavail;
		    defaults.files = stats.files;
		    defaults.ffree = stats.ffree;
		    defaults.fsid = stats.type;
	    }

      HEAP32[buf + 4 >> 2] = defaults.bsize;
      HEAP32[buf + 40 >> 2] = defaults.frsize;
      HEAP32[buf + 8 >> 2] = defaults.blocks;
      HEAP32[buf + 12 >> 2] = defaults.bfree;
      HEAP32[buf + 16 >> 2] = defaults.bavail;
      HEAP32[buf + 20 >> 2] = defaults.files;
      HEAP32[buf + 24 >> 2] = defaults.ffree;
      HEAP32[buf + 28 >> 2] = defaults.fsid;
      HEAP32[buf + 44 >> 2] = defaults.flags;
      HEAP32[buf + 36 >> 2] = defaults.namelen;
      return 0;
    } catch (e) {
      if (typeof FS == "undefined" || !(e.name === "ErrnoError"))
        throw e;
      return -e.errno;
    }
  }

As far as i can see the values are correctly added to the shared memory and read by the systemcall inside WASM.

I will create an issue on emscripten. There's probably a bit more that we need to consider and each filesystem might need different defaults or ways to determine more accurate numbers.

jeroenpf avatar Sep 20 '24 08:09 jeroenpf

I created an issue here: https://github.com/emscripten-core/emscripten/issues/22607 - if this approach has any merit, we could work on it. At least for NODEFS it seems to be fairly easy to add support for it.

jeroenpf avatar Sep 23 '24 11:09 jeroenpf

@fluiddot @wojtekn I created a first concept of how we could solve it. Things like error handling are still missing.

The idea is to use the FS instance the same way it is used throughout the code. It will try to call the node_ops.statfs function on the relevant filesystem or fall back on the defaults that existed before.

Here is a branch that i am working in: https://github.com/emscripten-core/emscripten/compare/main...jeroenpf:emscripten:fix-22607 - once it is closer to being finished I'll create a PR.

To run this I created a small test program:

test.c

#include <stdio.h>
#include <sys/vfs.h>
#include <errno.h>
#include <string.h>
#include <dirent.h>
#include <emscripten.h>
#include <inttypes.h>  // For PRIu32/64 macros

void print_statfs_info(const char* path) {
    struct statfs s;
    if (statfs(path, &s) != 0) {
        printf("statfs failed for %s: %s\n", path, strerror(errno));
        return;
    }
    printf("Filesystem information for %s:\n", path);
    printf("Block size: %lu\n", s.f_bsize);
    printf("Fragment size: %lu\n", s.f_frsize);
    printf("Total blocks: %" PRIu32 "\n", s.f_blocks);
    printf("Free blocks: %" PRIu32 "\n", s.f_bfree);
    printf("Available blocks: %" PRIu32 "\n", s.f_bavail);
    printf("Total inodes: %" PRIu32 "\n", s.f_files);
    printf("Free inodes: %" PRIu32 "\n", s.f_ffree);
}


int main() {

#ifndef NODERAWFS
  // Run some js that mounts the current directory at /working mounpoint
  EM_ASM(
    FS.mkdir('/working');
    FS.mount(NODEFS, { root: '.' }, '/working');
  );
#endif
  // Verify that the directory is mounted
  DIR *dir;
	dir = opendir("/working");
  if (dir == NULL) {
      printf("Failed to open directory: %s\n", strerror(errno));
      return 1;
  }

  // Print the statfs information for the /working directory
	print_statfs_info("/working/README.md");
  return 0;
}

To compile wasm: ./emcc test.c -o test.js -s ENVIRONMENT=node -s EXPORTED_RUNTIME_METHODS='["ccall"]' -lnodefs.js -s ASYNCIFY=1 -s ENVIRONMENT=node

To run: node -e "require('./test.js');"

jeroenpf avatar Sep 25 '24 13:09 jeroenpf

PR here: https://github.com/emscripten-core/emscripten/pull/22631/files

jeroenpf avatar Sep 26 '24 14:09 jeroenpf

Was this resolved?

adamziel avatar Dec 18 '24 13:12 adamziel

@adamziel no, we need to compile PHP-WASM in the Playground repository and then upgrade it in Studio.

wojtekn avatar Dec 18 '24 13:12 wojtekn

@brandonpayton is working on updating Emscripten to 3.1.74 as part of the PHP 8.4 work.

bgrgicak avatar Jan 08 '25 09:01 bgrgicak

Here is an in-progress PR for that upgrade: https://github.com/WordPress/wordpress-playground/pull/2116

I am AFK until the end of the week but plan to continue the PR afterward unless someone else finishes it before then.

brandonpayton avatar Jan 08 '25 23:01 brandonpayton

We update Emscripten in Playground. To resolve this issue, Studio will need to update Playground from our self-hosted packages.

Also, the Playground still doesn't support browser cookies, so if Studio want's to update Playground, it will need to apply the cookie patch.

bgrgicak avatar Feb 17 '25 10:02 bgrgicak

The Emscripten update didn't fix this issue. Related thread https://github.com/Automattic/studio/pull/959#issuecomment-2681211542

bgrgicak avatar Mar 03 '25 09:03 bgrgicak

@wojtekn is there a specific example that we could use to test our fix?

bgrgicak avatar Mar 25 '25 11:03 bgrgicak

@bgrgicak I don't know any specific example other than steps shown in the issue description.

wojtekn avatar Mar 25 '25 12:03 wojtekn