zenbot icon indicating copy to clipboard operation
zenbot copied to clipboard

Simulator hangs or exits abruptly (again)

Open mmdiego opened this issue 4 years ago • 8 comments

I've tracked several issues related to this problem is the past, and many attempts to fix it, but it seems it's not working for everybody (including me)

There was a critical change in commit 0ef671ef003c227c74fcfa0d308fe189ca4d8d9f to fix this issue, but it seems it worked for someones and broke it for others.

So, before the modification I found these related issues: #1922, #1971, #1976, #1977 After the modification I found these issues: #1983, #2487, #2315, #2412

I will tag some people involved in this issue: @jorisw @dlasher @Wheaties466 @tenaciousd93 It would be great to find a solution that works for everybody

Right now, I'm using node version 10.23, mongo version 3.2.11 and testing latest version of Zenbot unstable branch. The command I'm using to test is: node zenbot.js sim binance.BTC-USDT --strategy=noop --period=1m --days=7

For me, rolling back sim.js to version previous to the mentioned commit fixes the problem.

How can we procedd?

mmdiego avatar Feb 07 '21 01:02 mmdiego

Try to use node 8 or 14, in my case, node 10 always makes the process hang.

kennylbj avatar Feb 08 '21 00:02 kennylbj

I've tried with different versions and they behave differently, but with none of them I got a consistent behaviour. So, I think that's not a fix for this problem. My idea is to find the way to fix this issue and make it work for any version, as that is what stated as requirements for Zenbot.

mmdiego avatar Feb 08 '21 01:02 mmdiego

I welcome anyone to try Zenbot paper or live mode, first against Binance, then against GDAX, and debug the code that was altered in the mentioned commit. The change was necessary for me to keep Zenbot running against Binance, unfortunately it seems to have broken continuous operation against GDAX. I for one unfortunately don't have time to debug it, but I am confident that the code can be made to work for all exchanges.

jorisw avatar Feb 09 '21 20:02 jorisw

But the problem I'm reporting has nothing to do with paper or live trading. It's in simulation and the modification was inside sim.js, especifically in getNext() function. Also, I'm using Binance too. I don't know if it's related to the exchange. I've been debugging it and I've seen the same behavoir described in #2487 .

mmdiego avatar Feb 09 '21 22:02 mmdiego

I think this is harder than expected. I've been tracking other recent related changes and found these other issues: #2412 : simulation not working #2425 : fixed something related to lolex and mongodb that fixed simulation #2600 : reverted back last change because sims results were incorrect

I think this is highly related to the simulator problem. And tracking back when this lolex thing was added (#1456), I found it was to fix simulation results against live trading.

I agree the problem is being caused by lolex and the recent incompatibility with newer versions on mongodb, but it seems the fix proposed in #2425 wasn't fully correct or something else is missing to be done.

mmdiego avatar Feb 09 '21 23:02 mmdiego

I think the "recent versions of mongo" issue should be addressed pinning the mongo version to a fixed one that its well known to work. This should be done in the docker-compose files. For example:

mongodb:
    image: mongo:latest

to:

mongodb:
    image: mongo:4.4.2

LuisAlejandro avatar Mar 13 '21 14:03 LuisAlejandro

I'm also (still) getting this issue. It's definitely something to do with lolex <-> Mongo, but I'm having trouble narrowing it down. I can only get it to appear when I'm doing a lot of sims at once (eg. using the genetic backtester) which makes it difficult to get a debugger involved.

So far I've managed to narrow down exactly which settings are required for lolex to keep sims consistent:

// engine.js withOnPeriod function
    if (!clock && so.mode !== 'live' && so.mode !== 'paper') {
      clock = lolex.install({
        shouldAdvanceTime: false,
        now: trade.time,
        toFake: ['setTimeout', 'Date']
      });
    }

The lack of faking 'date' was what broke sims in the previous PR. (#2425) Unfortunately, the above doesn't actually fix the issue.

Switching to a lower mongo version and changing Node versions didn't appear to help me, neither did dropping the mongodb library version down to 3.6.1. I even tried 3.5.0, but that slowed down sims to the point of being unuseable.

I've currently got a proof of concept replacing Mongo with Redis, so I'll see if this issue crops up there too.

Makeshift avatar Dec 12 '21 06:12 Makeshift

After all night of messing with it, I'm no closer to fixing it, but I did implement a workaround to kill sims that seem to be stuck:

diff --git a/lib/backtester.js b/lib/backtester.js
index f225d30f..d1860983 100644
--- a/lib/backtester.js
+++ b/lib/backtester.js
@@ -228,6 +228,9 @@ let ensureDirectoryExistence = function (filePath) {
   fs.mkdirSync(dirname)
 }

+let simPercent = {};
+let simProcs = {};
+
 let monitor = {
   periodDurations: [],
   phenotypes: [],
@@ -366,6 +369,13 @@ let monitor = {
           slowestP = p
           slowestEta = eta
         }
+        if (!simPercent[c.iteration]) simPercent[c.iteration] = [];
+        simPercent[c.iteration].push(percentage)
+        let lastPercents = simPercent[c.iteration].slice(Math.max(simPercent[c.iteration].length - 30, 0));
+        if (lastPercents.length === 30 && new Set(lastPercents).size == 1 && simProcs[c.iteration]) {
+          console.log(`${c.iteration} is stuck at ${lastPercents[0]}, killing it off!`);
+          simProcs[c.iteration].kill('SIGKILL')
+        }

         if (homeStretchMode)
           inProgressStr.push(`${(c.iteration + ':').gray} ${(percentage * 100).toFixed(1)}% ETA: ${monitor.distanceOfTimeInWords(eta, now)}`)
@@ -543,6 +553,7 @@ module.exports = {
     var cmdArgs = command.commandString.split(' ')
     var cmdName = cmdArgs.shift()
     const proc = spawn(cmdName, cmdArgs)
+    simProcs[command.iteration] = proc;
     var endData = ''

     proc.on('exit', () => {

Absolutely not ideal, but it appears to only happen to about ~2% of my sims, so I'll take it for now...

edit: Obviously this isn't useful if individual sims hang for you, this is only useful if a subset of sims hang when you're backtesting.

Makeshift avatar Dec 12 '21 06:12 Makeshift