graaljs icon indicating copy to clipboard operation
graaljs copied to clipboard

File size limitations are worse than Rhino

Open moayman opened this issue 6 years ago • 12 comments

So, my tool was JDK 7 based and upon moving to JDK 8 we faced a capacity limitation. Nashorn abides by ByteCode limitations of 64 KB.

Working with Rhino we were able to work with 350MB JS files. Nashorn was a big no for us. So we started looking into GraalVM but we found the limit of 268435455 bytes hardcoded in https://github.com/graalvm/graaljs/blob/master/graal-js/src/com.oracle.js.parser/src/com/oracle/js/parser/Token.java 0xFFFFFFF.

Why is that? Can we add an option to override the limit value? Are there any plans to increase this limit?

moayman avatar Sep 16 '18 15:09 moayman

Hi moayman,

thanks for your request. We inherited that limitation from Nashorn as we are reusing their parser. Lifting this limit might be possible with some effort, however, it is unclear to me what other implications such a huge application would bring.

I assume this is some form of generated code that you try to execute?

An easy workaround might be to split your JS source file into separate smaller files and use load(filename) to parse and access them from the main file. See https://github.com/graalvm/graaljs/blob/master/docs/user/JavaScriptCompatibility.md#loadsource

Best, Christian

wirthi avatar Sep 17 '18 08:09 wirthi

Hi @wirthi,

Thanks a lot for your reply.

I have more questions now. Umm, Is this the same Nashorn limit? I thought that Nashorn was limited to the Java ByteCode 64KB limitation and that it wasn't limited by file size.

Yes, you are 100% correct I have another generator that runs and produces a huge JS file.

Thanks for letting me know about the load method. Umm, I though have a question. The JS file I am trying to use is basically a one huge function that I need to execute at once. How can I achieve that using load after splitting the function?

I have another a bit off topic question, if you may. I used to have the function with a name and used to execute it by its name. For example: JS: function myFunc(p1,p2,p3) { // code } JAVA: ScriptEngineManager mgr = new ScriptEngineManager(); Engine = mgr.getEngineByName("js"); reader = new FileReader(new File(path)); Engine.eval(reader); Object o = ((Invocable)Engine).invokeFunction("myFunc", p1, p2, p3);

Now after I tried using GraalJS, I couldn't execute a function selectively by providing its name. Value returned from the eval method has canExecute() set to false unless the whole file I am evaluating is a function that has no name, if for example I set a var before the function canExecute() would be false.

The code I am using with GraalJS is the following: context = Context.newBuilder("js").allowAllAccess(true).build(); Source source = Source.newBuilder("js", new File(path)).build(); Value myFunc = context.eval(source); if(myFunc.canExecute()) Value o = myFunc.execute(p1, p2, p3); The question is how can I achieve the old behavior? Say I have a file having two functions func1 and func2. I eval the file. How can I execute func1 or func2 selectively?

Thanks a lot.

moayman avatar Sep 17 '18 13:09 moayman

Umm, Is this the same Nashorn limit?

@moayman Yes, that's is the file size limit for parsing in both engines. Also note that Nashorn automatically splits large JS functions in order to stay below the bytecode limit, but it's still possible to exceed the limit anyhow.

woess avatar Sep 17 '18 14:09 woess

@moayman If you eval e.g.:

 function func1() { /* code */ }
 function func2() { /* code */ }

You should be able to access the functions via context.getBindings("js"): Value func1 = context.getBindings("js").getMember("func1").

woess avatar Sep 17 '18 14:09 woess

Hi moayman,

Thanks for letting me know about the load method. Umm, I though have a question. The JS file I am trying to use is basically a one huge function that I need to execute at once. How can I achieve that using load after splitting the function?

main.js:

load("child.js");
childFunction();

child.js:

function childFunction() {
    print("Hello childFunction");
}

Of course, this assumes that you have split your huge function into two separate ones, and can call one from the other.

Best, Christian

wirthi avatar Sep 17 '18 14:09 wirthi

@woess Thanks a lot for this info this would help me a bunch.

@wirthi Thanks for the info. Ummm, I don't really know if it is possible for the generated code to be split easily enough. But, I will look into it.

moayman avatar Sep 17 '18 14:09 moayman

@moayman 350mb of JS is huge, is it minified? because that would be a simple first step

nhoughto avatar Sep 22 '18 03:09 nhoughto

@nhoughto No it is not minified and it also contains comments, and thank you for pointing it out because it didn't come to my mind to minify it. I know it is huge. It's generated though.

The whole point is that Rhino worked just fine. So, I believe that it is a needed enhancement.

moayman avatar Sep 23 '18 10:09 moayman

So umm, I was able to get Rhino Jars from Maven repo and I tried using it and found that Rhino has the same capacity limitation.

What used to make it work however was setting optimization level to -1 which basically stops using CodeGen (compilation to ByteCode) and interprets the JS file.

I believe the ByteCode usage is done to enhance performance, am I right?

Is there an option to disable it either in Nashorn or GraalJS?

moayman avatar Sep 24 '18 16:09 moayman

How do you end up with 350MB of JS? Is it really all code or does it include a large amount of data? Either way, I'd suggest to split it up into multiple files.

I believe the ByteCode usage is done to enhance performance, am I right? Is there an option to disable it either in Nashorn or GraalJS?

Nashorn does not have an interpreter, so it's not possible to disable it there. GraalJS does not generate any bytecode, it's a parser limitation (same as Nashorn). We don't have plans to support files larger than 256MB at the moment.

woess avatar Sep 26 '18 23:09 woess

@woess I believe that Nashorn used to issue method code too large error which is I believe related to ByteCode limitation. I am not 100% sure about this but that's what people told me on stackoverflow.

I told you that the 350MB JS file is a generated file.

I really hope that the parser limitation can be increased soon because I was looking forward to using GraalJS.

Thinking about it, I believe that being able to control this parser limit by the user would be a good option to fit the user's usecase. What's your opinion guys?

moayman avatar Sep 30 '18 11:09 moayman

@moayman it is definitly possible to lift the limitation. This requires to change how tokens are stored in the parser. We are happy to receive contributions that improve the parser in this aspect.

For us, this work is not very high up our priorities list, as a source code file that large represents a very exceptional usecase. A reasonable workaround is to extract data to a separate file as described above.

wirthi avatar Oct 01 '18 10:10 wirthi