phpspy icon indicating copy to clipboard operation
phpspy copied to clipboard

Skip frames where zfunc is null

Open TysonAndre opened this issue 3 years ago • 5 comments

Previously, it would print the message copy_proc_mem: Not copying zfunc; raddr is NULL even if subsequent frames from execute_data could have been read (e.g. Redis::evalsha and other redis methods were affected by this when instrumenting a use of redis where the server was 100 milliseconds away)

Before, it would not print any of the subsequent frames making the stack trace less representative - the try macro would immediately return. After this PR, skip frames where the function is null.

TysonAndre avatar Oct 09 '20 19:10 TysonAndre

Hi @TysonAndre, thanks for the patch.

Do I understand correctly that zend_execute_data.func is sometimes NULL in the middle of a call stack after VM optimizations are applied? For my own understanding can you provide an example of userland code that produces that?

adsr avatar Oct 09 '20 21:10 adsr

The code in question is like this, with https://github.com/phpredis/phpredis/ , php 7.2, and opcache enabled, and xdebug was disabled

I'm not sure if it's related to internal method calls, to phpredis, to opcache, or something else

    private function _invokeRedisFunction(string $function, array $args) {
        switch (strtolower($function)) {
            case 'eval':
            case 'evaluate':
                $luaScript = $args[0];
                $sha1 = $this->_luaScripts[$luaScript] ?? null;
                $contents = null;
                if (is_null($sha1)) {
                    $contents = $this->_getLuaScriptContents($luaScript);
                    $sha1 = sha1($contents);
                    $this->_luaScripts[$luaScript] = $sha1;
                    $this->updateCache();
                }
                $result = $this->_redis->evalsha($sha1, $args[1], $args[2]);
                if ($result === false && strpos($this->_redis->getLastError(), 'NOSCRIPT') === 0) {
                    $this->_redis->clearLastError();
                    $fileContents = $contents ?: $this->_getLuaScriptContents($luaScript);
                    return $this->_redis->eval(
                        $fileContents,
                        $args[1],
                        $args[2]
                    );
                }
                return $result;
            default:
                // All other commands go through here:
                return $this->_redis->$function(...$args);
        }
    }

TysonAndre avatar Oct 09 '20 22:10 TysonAndre

Other notes:

  • The project was using runkit7 - that clears the function's run time cache, but it didn't crash, so I can't see how call->func would have been unset. I still see this in other projects even without using runkit7, so it might be something else such as the call being out of date by the time it's read.
  • The call to redis may have been timing out due to issues with the proxy but that seems unrelated
  • Code was compiled with -O3
  • Trying to reproduce this in a simpler test case runs into issues

If the last frame gets skipped (i.e. <main>, *depth - 1), then it's probably better to just return PHPSPY_ERR

TysonAndre avatar Oct 09 '20 23:10 TysonAndre

I'm not quite sure why that was happening

If the last frame gets skipped (i.e. <main>, *depth - 1), then it's probably better to just return PHPSPY_ERR

Whatever I was seeing with Redis::hget, Redis::evalsha, etc. in the original stack frames is probably an outlier. This was probably using __call() in the parent stack frames, which may or may not be related

  • this extension handles call properly elsewhere.
  • I was seeing the problem on frame 0 all of the time in redis when running phpunit, not on other frames, but other methods were properly reported

I also see this in an application (phan) that doesn't even use redis, a few times but a negligible fraction of the overall time, and rarely ever for the 0th frame. I see that magic spl_autoload_call may be underrepresented because there's no func, or just complete really quickly.

The current output format is missing numbers, and renumbering would probably work better with stackcollapse scripts and flame graphs

Printed frames 2 skipped 1
0 Phan\Language\UnionType::hasCommonType /path/to/phan/src/Phan/Language/UnionType.php:1278
...
Printed frames 41 skipped 3
0 Composer\Autoload\includeFile /path/to/phan/vendor/composer/ClassLoader.php:442
1 Composer\Autoload\ClassLoader::loadClass /path/to/phan/vendor/composer/ClassLoader.php:319
2 spl_autoload_call <internal>:-1
4 Phan\Language\Element\Method::Phan\Language\Element\{closure} /path/to/phan/src/Phan/Language/Element/FunctionTrait.php:1929
5 Phan\Language\Element\Method::memoize /path/to/phan/src/Phan/Memoize.php:40
6 Phan\Language\Element\Method::getVariableTypeFallbackMap /path/to/phan/src/Phan/Language/Element/FunctionTrait.php:1927
7 Phan\Analysis\ConditionVisitor::getTypesFallback /path/to/phan/src/Phan/Analysis/ConditionVisitorUtil.php:173
8 Phan\Analysis\ConditionVisitor::Phan\Analysis\{closure} /path/to/phan/src/Phan/Analysis/ConditionVisitorUtil.php:218
9 Phan\Analysis\ConditionVisitor::updateVariableWithConditionalFilter /path/to/phan/src/Phan/Analysis/ConditionVisitorUtil.php:611
10 Phan\Analysis\ConditionVisitor::removeFalseyFromVariable /path/to/phan/src/Phan/Analysis/ConditionVisitorUtil.php:197
11 Phan\Analysis\ConditionVisitor::visitVar /path/to/phan/src/Phan/Analysis/ConditionVisitor.php:554
12 Phan\AST\Visitor\KindVisitorImplementation::__invoke /path/to/phan/src/Phan/AST/Visitor/KindVisitorImplementation.php:32
13 Phan\Analysis\NegatedConditionVisitor::visitUnaryOp /path/to/phan/src/Phan/Analysis/NegatedConditionVisitor.php:241
14 Phan\AST\Visitor\KindVisitorImplementation::__invoke /path/to/phan/src/Phan/AST/Visitor/KindVisitorImplementation.php:32
15 Phan\BlockAnalysisVisitor::visitIf /path/to/phan/src/Phan/BlockAnalysisVisitor.php:2029
16 Phan\BlockAnalysisVisitor::analyzeAndGetUpdatedContext /path/to/phan/src/Phan/BlockAnalysisVisitor.php:573
17 Phan\BlockAnalysisVisitor::visitStmtList /path/to/phan/src/Phan/BlockAnalysisVisitor.php:217
18 Phan\BlockAnalysisVisitor::visitIf /path/to/phan/src/Phan/BlockAnalysisVisitor.php:2029
19 Phan\BlockAnalysisVisitor::analyzeAndGetUpdatedContext /path/to/phan/src/Phan/BlockAnalysisVisitor.php:573
20 Phan\BlockAnalysisVisitor::visitStmtList /path/to/phan/src/Phan/BlockAnalysisVisitor.php:217
21 Phan\BlockAnalysisVisitor::analyzeAndGetUpdatedContext /path/to/phan/src/Phan/BlockAnalysisVisitor.php:573
22 Phan\BlockAnalysisVisitor::visitForeach /path/to/phan/src/Phan/BlockAnalysisVisitor.php:934
23 Phan\BlockAnalysisVisitor::analyzeAndGetUpdatedContext /path/to/phan/src/Phan/BlockAnalysisVisitor.php:573
24 Phan\BlockAnalysisVisitor::visitStmtList /path/to/phan/src/Phan/BlockAnalysisVisitor.php:217
25 Phan\BlockAnalysisVisitor::analyzeAndGetUpdatedContext /path/to/phan/src/Phan/BlockAnalysisVisitor.php:573
26 Phan\BlockAnalysisVisitor::visitMethod /path/to/phan/src/Phan/BlockAnalysisVisitor.php:3018
27 Phan\BlockAnalysisVisitor::analyzeAndGetUpdatedContext /path/to/phan/src/Phan/BlockAnalysisVisitor.php:573
28 Phan\BlockAnalysisVisitor::visitStmtList /path/to/phan/src/Phan/BlockAnalysisVisitor.php:217
29 Phan\BlockAnalysisVisitor::analyzeAndGetUpdatedContext /path/to/phan/src/Phan/BlockAnalysisVisitor.php:573
30 Phan\BlockAnalysisVisitor::visitClosedContext /path/to/phan/src/Phan/BlockAnalysisVisitor.php:1451
31 Phan\BlockAnalysisVisitor::visitClass /path/to/phan/src/Phan/BlockAnalysisVisitor.php:3004
32 Phan\BlockAnalysisVisitor::analyzeAndGetUpdatedContext /path/to/phan/src/Phan/BlockAnalysisVisitor.php:573
33 Phan\BlockAnalysisVisitor::visitStmtList /path/to/phan/src/Phan/BlockAnalysisVisitor.php:217
34 Phan\AST\Visitor\KindVisitorImplementation::__invoke /path/to/phan/src/Phan/AST/Visitor/KindVisitorImplementation.php:32
35 Phan\Analysis::analyzeFile /path/to/phan/src/Phan/Analysis.php:512
36 Phan\Phan::Phan\{closure} /path/to/phan/src/Phan/Phan.php:554
37 Phan\Phan::finishAnalyzingRemainingStatements /path/to/phan/src/Phan/Phan.php:469
38 Phan\Phan::analyzeFileList /path/to/phan/src/Phan/Phan.php:199
39 <main> /path/to/phan/src/phan.php:1
40 <main> /path/to/phan/phan:1
--- a/phpspy_trace.c
+++ b/phpspy_trace.c
@@ -97,6 +97,7 @@ static int trace_stack(trace_context_t *context, zend_execute_data *remote_execu
     target = &context->target;
     frame = &context->event.frame;
     *depth = 0;
+    int skipped_frame = -1;
 
     while (remote_execute_data && *depth != opt_max_stack_depth) { /* TODO make options struct */
         memset(&execute_data, 0, sizeof(execute_data));
@@ -137,10 +138,15 @@ static int trace_stack(trace_context_t *context, zend_execute_data *remote_execu
             }
             frame->depth = *depth;
             try(rv, context->event_handler(context, PHPSPY_TRACE_EVENT_FRAME));
+        } else {
+            skipped_frame = *depth;
         }
         remote_execute_data = execute_data.prev_execute_data;
         *depth += 1;
     }
+    if (skipped_frame >= 0) {
+        fprintf(stderr, "Printed frames %d skipped %d\n", (int)*depth, skipped_frame);
+    }
 
     return PHPSPY_OK;
 }

TysonAndre avatar Oct 10 '20 00:10 TysonAndre

Hi @TysonAndre I haven't been able to reproduce this yet. On my setup whenever phpspy outputs copy_proc_mem: Not copying zfunc; raddr is NULL, then I always see execute_data.prev_execute_data == NULL, but in your case it seems to be a valid memory address sometimes.

What do you think about extending the behavior of --continue-on-error to keep trying to copy the call stack if any errors are encountered while trying to read a call frame. So anything in this range:

https://github.com/adsr/phpspy/blob/9cc911f33b758b27ced14e8be5c574a05ea69100/phpspy_trace.c#L110-L137

In other words, as long as we have a non-NULL memory address for the next call frame, phpspy will try to read it even if the current frame failed to read properly.

adsr avatar Oct 31 '20 03:10 adsr

I'm guessing this is just inevitable with the application running while I'm benchmarking it if I forgot to set --pause-process - when you read the memory of a running application, the function may have already returned and the heap space used for the function could be quickly reclaimed by something else. The extra frames I do see aren't very useful

I was using the default arguments. I should have used --pause-process instead for benchmarking a CLI application(phan) - it fits that use case

  -S, --pause-process                Pause process while reading stacktrace
                                       (unsafe for production!)

I was confused about how all the macros and abstractions worked. I added some printing statements.

Observed: When copy_proc_mem fails for the first function on the stack trace, it sleeps for --sleep-ms seconds Expected: There should be an option to aggressively retry printing the stack from the first step immediately without sleeping, on failure, so that the phpspy output reflects a running application's stack trace accurately. (Optionally suppress the repeated warnings until the time elapses or the first success) Alternative: Print a suggestion to use --pause-process when copy_proc_mem fails if that option is not set, mentioning that it is not safe in production

1666575804.808640 rv=0
copy_proc_mem: Failed to copy zfunc; err=Bad address raddr=0x7f7e00000004 size=156
1666575804.818741 rv=0
0 ast\parse_code <internal>:-1
1 Phan\AST\Parser::parseCodeHandlingDeprecation /path/to/phan/src/Phan/AST/Parser.php:126
2 Phan\AST\Parser::parseCode /path/to/phan/src/Phan/AST/Parser.php:102
3 Phan\Analysis::parseFile /path/to/phan/src/Phan/Analysis.php:73
4 Phan\Phan::analyzeFileList /path/to/phan/src/Phan/Phan.php:196
5 <main> /path/to/phan/src/phan.php:1
6 <main> /path/to/phan/phan:1

1666575804.829005 rv=0
copy_proc_mem: Failed to copy zfunc; err=Bad address raddr=0x7f7e00000308 size=156
1666575804.839143 rv=0
1666575804.849331 Printed frames 2 skipped 1
0 Phan\Language\Type::make /path/to/phan/src/Phan/Language/Type.php:427
0 Phan\Language\Type::make /path/to/phan/src/Phan/Language/Type.php:427

1666575804.849365 rv=0
copy_proc_mem: Failed to copy zfunc; err=Bad address raddr=0x7f7e00000007 size=156
1666575804.859566 rv=0
copy_proc_mem: Failed to copy zfunc; err=Bad address raddr=0x307 size=156
1666575804.869782 rv=0
0 file_get_contents <internal>:-1
1 Phan\Library\FileCache::getOrReadEntry /path/to/phan/src/Phan/Library/FileCache.php:91
2 Phan\Analysis::parseFile /path/to/phan/src/Phan/Analysis.php:73
3 Phan\Phan::analyzeFileList /path/to/phan/src/Phan/Phan.php:196
4 <main> /path/to/phan/src/phan.php:1
5 <main> /path/to/phan/phan:1

1666575804.879968 rv=0
copy_proc_mem: Failed to copy zfunc; err=Bad address raddr=0x7 size=156
1666575804.890128 rv=0
copy_proc_mem: Failed to copy class_name; err=Bad address raddr=0x8b4808ec83485355 size=25
1666575804.900356 rv=0
copy_proc_mem: Failed to copy zfunc; err=Bad address raddr=0x100000307 size=156
1666575804.910597 rv=0

TysonAndre avatar Oct 24 '22 01:10 TysonAndre