humanify Better error handling when `humanify local` ran on empty file (Error: Failed to stringify code)

⇒ touch foo.js

⇒ npx humanifyjs local --disableGpu foo.js
(node:98609) [DEP0040] DeprecationWarning: The `punycode` module is deprecated. Please use a userland alternative instead.
(Use `node --trace-deprecation ...` to show where the warning was created)
[node-llama-cpp] Using this model ("~/.humanifyjs/models/Phi-3.1-mini-4k-instruct-Q4_K_M.gguf") to tokenize text with special tokens and then detokenize it resulted in a different text. There might be an issue with the model or the tokenizer implementation. Using this model may not work as intended
Processing: 100%
file:///Users/devalias/dev/foohumanify/node_modules/humanifyjs/dist/index.mjs:56611
    throw new Error("Failed to stringify code");
          ^

Error: Failed to stringify code
    at visitAllIdentifiers (file:///Users/devalias/dev/foohumanify/node_modules/humanifyjs/dist/index.mjs:56611:11)
    at async file:///Users/devalias/dev/foohumanify/node_modules/humanifyjs/dist/index.mjs:56687:12
    at async unminify (file:///Users/devalias/dev/foohumanify/node_modules/humanifyjs/dist/index.mjs:196:27)
    at async Command.<anonymous> (file:///Users/devalias/dev/foohumanify/node_modules/humanifyjs/dist/index.mjs:56707:3)

Node.js v22.6.0

Aug 23 '24 03:08 0xdevalias

Turns out that this issue is the same root cause:

https://github.com/jehna/humanify/issues/111

Which I debugged and identified the root cause + suggested potential places to fix it here:

Looking at your screenshot from https://github.com/jehna/humanify/issues/111#issuecomment-2365764329, it looks like the error is occuring on file 18:

Looking at the code where this happens, it seems to be within unminify, just after webcrack has run to extract the files, and in the main loop where those files are being processed:

https://github.com/jehna/humanify/blob/c2e0be679dcab43ec684668202d709f41452c0e9/src/unminify.ts#L6-L30

Since even with --verbose enabled, humanify doesn't output the filenames it's processing, it makes it a bit harder to debug; but running it through a debugger we can see what 'file 18' of extractedFiles corresponds with anyway (assuming webcrack consistently unpacks the code the same way); which since the output log shows i + 1, that makes it actually entry 17 in the array (because it's 0-indexed); which seems to be the extracted file 1.js:

Looking at the contents of output/1.js, we can see that it's an empty file:
⇒ cat output/1.js | wc -m
       0
As part of the main unminify loop, the code then calls plugins.reduce on the plugins passed in to unminify:

https://github.com/jehna/humanify/blob/c2e0be679dcab43ec684668202d709f41452c0e9/src/commands/openai.ts#L26-L30

When run as humanify openai, these plugins are:

babel (Ref)

`openaiRename({ apiKey, model: opts.model }) (Ref)

prettier (Ref)

The Failed to stringify code error you're seeing comes from the visitAllIdentifiers function, just after calling babel's transformFromAstAsync:

https://github.com/jehna/humanify/blob/c2e0be679dcab43ec684668202d709f41452c0e9/src/plugins/local-llm-rename/visit-all-identifiers.ts#L57-L61

https://babeljs.io/docs/babel-core#transformfromastasync

Which is called by the openaiRename plugin that is passed to unminify:

https://github.com/jehna/humanify/blob/c2e0be679dcab43ec684668202d709f41452c0e9/src/plugins/openai/openai-rename.ts#L6-L38

transformFromAstAsync accepts a parsed ast, and returns an object with code, map, and ast.

Presumably, with an empty file as input, that would result in an empty ast, which when passed to transformFromAstAsync, would result in an empty string returned in the code field.

Given the check that ends up throwing the error you're seeing is looking for a falsy value in stringified.code; the empty string would definitely trigger that:

https://github.com/jehna/humanify/blob/c2e0be679dcab43ec684668202d709f41452c0e9/src/plugins/local-llm-rename/visit-all-identifiers.ts#L57-L62

So the above seems to be the debugging as to the 'why' of what is going on; though I'm not familiar enough with the codebase to suggest where the best point of implementing the fix is.

If I was to suggest the most minimal/localised fix, it would be to the within if (!stringified?.code) check within visitAllIdentifiers (Ref)

Though an argument could also be made that filtering out empty/similar files at the unminify level could have it's merits (Ref); though that would then make it less generic/flexible, as theoretically a plugin in the chain could make it a non-empty file before the openaiRename actually needs to process it.

So I still think the best fix is probably within visitAllIdentifiers.

In fact, in looking through related issues, this just seems to be a variation of the same root cause that I raised in this issue:

https://github.com/jehna/humanify/issues/54

I also think the general CLI output should be improved to explain what is happening better. eg. a message before running webcrack, outputting the actual filenames (not just the file number), etc.

These issues also seem to tangentially relate to some aspect of this latter suggestion:

https://github.com/jehna/humanify/issues/92

https://github.com/jehna/humanify/issues/58

Originally posted by @0xdevalias in https://github.com/jehna/humanify/issues/111#issuecomment-2375785482

Sep 26 '24 03:09 0xdevalias

Hacky workaround:

I haven't tried it, but a quick/dirty hack fix could just be to add an if check to skip empty files within unminify:

Something like:

  export async function unminify(
    filename: string,
    outputDir: string,
    plugins: ((code: string) => Promise<string>)[] = []
  ) {
    ensureFileExists(filename);
    const bundledCode = await fs.readFile(filename, "utf-8");
    const extractedFiles = await webcrack(bundledCode, outputDir);
  
    for (let i = 0; i < extractedFiles.length; i++) {
      console.log(`Processing file ${i + 1}/${extractedFiles.length}`);
  
      const file = extractedFiles[i];
      const code = await fs.readFile(file.path, "utf-8");
+     if (code) {
      const formattedCode = await plugins.reduce(
        (p, next) => p.then(next),
        Promise.resolve(code)
      );
  
      verbose.log("Input: ", code);
      verbose.log("Output: ", formattedCode);
  
      await fs.writeFile(file.path, formattedCode);
+     }
+     else {
+       verbose.log(`Skipping empty file ${i + 1} (${file.path})`);
+     }
    }
  }

While this wouldn't really be suited for a proper fix, it should hopefully be enough to do what you need to work around this bug in the meantime.

Originally posted by @0xdevalias in https://github.com/jehna/humanify/issues/111#issuecomment-2378108249

Sep 26 '24 23:09 0xdevalias

I believe this PR by @j4k0xb will also technically resolve this issue; though I wonder if it would be better UX to detect the empty file earlier on at the command level, and provide a more 'human focussed' error/warning at that stage as well:

https://github.com/jehna/humanify/pull/134

The nuanced difference between this issue and the original bug in #111 is that this issue is about starting with an empty file, whereas #111 is about webcrack unbundling creating empty files that then break humanify later on when it tries to run over them; which is what #134 specifically fixes/works around.

Sep 29 '24 01:09 0xdevalias

An empty file is valid JavaScript so there shouldn't be any warning/error imo. That's what most CLIs (formatters, bundlers, etc) do.

Sep 29 '24 01:09 j4k0xb

An empty file is valid JavaScript so there shouldn't be any warning/error imo.

@j4k0xb I'm specifically talking about the human focussed error/warning when running this against an empty starting file, and providing better feedback to the user around that. If needs be you could add an escape hatch of --allow-empty-files or similar if there was really a need to; but I don't think showing an extra UX focussed note in the log output is a big issue.

And then similarly for empty files when they have been unbundled, providing that feedback in the logs (or perhaps only verbose logs) can be useful.

Like, these tiny UX focussed tweaks I made to what was displayed in the output logs (ignoring the 'skip to file' part) made a MASSIVE difference to helping track down and identify where/why a bug occurred, or why the program seemed to be hanging so long/similar:

I made a few patches to my locally checked out version of humanify to help provide better output info + skip through to the relevant files that seem to be causing errors:

src/unminify.ts:

diff --git a/src/unminify.ts b/src/unminify.ts
index 948b4df..ff95ae0 100644
--- a/src/unminify.ts
+++ b/src/unminify.ts
@@ -9,22 +9,42 @@ export async function unminify(
   plugins: ((code: string) => Promise<string>)[] = []
 ) {
   ensureFileExists(filename);
+
+  console.log(`Reading input file: ${filename}`);
   const bundledCode = await fs.readFile(filename, "utf-8");
+
+  console.log('Running webcrack to extract/decrypt bundled code...');
   const extractedFiles = await webcrack(bundledCode, outputDir);
 
+  console.log(`Unbundled ${extractedFiles.length} files from ${filename} into ${outputDir}`);
+
   for (let i = 0; i < extractedFiles.length; i++) {
-    console.log(`Processing file ${i + 1}/${extractedFiles.length}`);
-
     const file = extractedFiles[i];
+
+    console.log(`Processing file ${i + 1}/${extractedFiles.length} (${file.path})`);
+
+    // Skip extracted files till we find the one we actually want to try and process
+    if (file.path !== `${outputDir}/6blF.js`) {
+      console.log('  [hack] Skipping files that are not the one we want to process');
+      continue;
+    }
+
     const code = await fs.readFile(file.path, "utf-8");
-    const formattedCode = await plugins.reduce(
-      (p, next) => p.then(next),
-      Promise.resolve(code)
-    );
+    if (code) {
+      const formattedCode = await plugins.reduce(
+        (p, next) => p.then(next),
+        Promise.resolve(code)
+      );
 
-    verbose.log("Input: ", code);
-    verbose.log("Output: ", formattedCode);
+      verbose.log("  Input: ", code);
+      verbose.log("  Output: ", formattedCode);
 
-    await fs.writeFile(file.path, formattedCode);
+      await fs.writeFile(file.path, formattedCode);
+    }
+    else {
+      verbose.log(`  Skipping empty file ${i + 1} (${file.path})`);
+    }
   }
+
+  console.log(`Unminified (or skipped) ${extractedFiles.length} files from ${filename} into ${outputDir}`);
 }

Originally posted by @0xdevalias in https://github.com/jehna/humanify/issues/117#issuecomment-2378447998

Sep 29 '24 02:09 0xdevalias

True, more verbose code and progress logging seems useful But still not a fan of skipping files...

What if the code uses require("./empty.js") which get's removed because of this?
After that error got fixed, what information do you gain from seeing which ones are empty (input/output also shows it)?

Sep 29 '24 02:09 j4k0xb

But still not a fan of skipping files...

@j4k0xb I guess that really depends where 'skipping' is implemented. The hacky workaround in unminify (Ref) was never suggested as a proper fix to this, it was only a hacky workaround, as I said at the bottom of it:

While this wouldn't really be suited for a proper fix, it should hopefully be enough to do what you need to work around this bug in the meantime.

But for example, because unminify reduces over the passed in plugins, and feeds the code into them; the openaiRename plugin:

https://github.com/jehna/humanify/blob/64a1b9511d2e42b705df0858674eab1f5e83dd0d/src/plugins/openai/openai-rename.ts#L6-L20

Or even visitAllIdentifiers itself:

https://github.com/jehna/humanify/blob/64a1b9511d2e42b705df0858674eab1f5e83dd0d/src/plugins/local-llm-rename/visit-all-identifiers.ts#L15-L25

Could have an 'early guard statement' when code is empty, that provides some feedback and then returns before even needing to attempt to parse the AST.

My suggestion is about identifying the appropriate place to check that the inputs are relevant, and react appropriately if not. There's no reason to parse the AST and have all the other code in that file run just to turn it back into an empty string if we can just handle that case specifically at the beginning. While it's potentially a negligible performance gain, at the very least it makes the handling explicit, and makes the code easier to reason about/less chances for bugs to occur because of it.

Like without thinking too deeply about it, I would probably implement something like this:

   export async function visitAllIdentifiers( 
     code: string, 
     visitor: Visitor, 
     onProgress?: (percentageDone: number) => void 
   ) {
+    if (code === "") return ""
+
     const ast = await parseAsync(code); 
     const visited = new Set<string>(); 
     const renames = new Set<string>(); 
     if (!ast) { 
       throw new Error("Failed to parse code"); 
     }

And because visitAllIdentifiers doesn't seem to currently have any patterns around logging, to keep with that, I would also make a change in openaiRename (and other similar plugins) like this:

  export function openaiRename({ 
     apiKey, 
     model 
   }: { 
     apiKey: string; 
     model: string; 
   }) { 
     const client = new OpenAI({ apiKey }); 
    
     return async (code: string): Promise<string> => { 
+    if (code === "") {
+      verbose.log("Skipping empty file");
+      return code
+    }
+
       return await visitAllIdentifiers( 
         code, 
         async (name, surroundingCode) => { 
           verbose.log(`Renaming ${name}`); 
           verbose.log("Context: ", surroundingCode);

Sep 29 '24 02:09 0xdevalias

I'm specifically talking about the human focussed error/warning when running this against an empty starting file, and providing better feedback to the user around that.

In addition to the suggestions in my last comment, I was also going to suggest adding a warning/error at the command level, but in looking at the code again now, I don't think that makes sense:

https://github.com/jehna/humanify/blob/64a1b9511d2e42b705df0858674eab1f5e83dd0d/src/commands/openai.ts#L20-L30

Basically, the openai command is just calling unminify, and it's within unminify that the file is first actually read, before being passed to webcrack:

https://github.com/jehna/humanify/blob/64a1b9511d2e42b705df0858674eab1f5e83dd0d/src/unminify.ts#L6-L13

So the only way I think it would make sense to implement anything new at the command level is if unminify was refactored to instead accept code rather than filename; but at this stage, I'm not really sure if there is any added benefit to doing that; unless it becomes useful to do so as part of the implementation for issues like:

https://github.com/jehna/humanify/issues/129
https://github.com/jehna/humanify/issues/3

So I think just the 2 minimal suggestions I made in https://github.com/jehna/humanify/issues/54#issuecomment-2381073944 + the fix in https://github.com/jehna/humanify/pull/134 would probably satisfy my requests on this issue.

Sep 29 '24 02:09 0xdevalias

The hacky workaround in unminify (Ref) was never suggested as a proper fix to this, it was only a hacky workaround, as I said at the bottom of it:

While this wouldn't really be suited for a proper fix, it should hopefully be enough to do what you need to work around this bug in the meantime.

Looks like that hacky workaround (rather than the better full solution mentioned above) was what @jehna ended up implementing in #134 in the end though.

https://github.com/jehna/humanify/pull/134/files#r1800294136

https://github.com/j4k0xb/humanify/blob/96c093bc9965ca02fa1c269db1a0d445a7d6bacf/src/unminify.ts#L21-L24

Oct 15 '24 01:10 0xdevalias

This should be fixed now

Oct 18 '24 21:10 jehna

This should be fixed now

@jehna Fixed technically, but see notes above in https://github.com/jehna/humanify/issues/54#issuecomment-2412626831 + https://github.com/jehna/humanify/issues/54#issuecomment-2381073944 for why the way it was implemented may not have been the ideal place to do so; and what a more flexible option would have been.

Oct 20 '24 01:10 0xdevalias