gno icon indicating copy to clipboard operation
gno copied to clipboard

test(bug): txtar to reproduce the VM hanging during Render

Open jefft0 opened this issue 11 months ago • 7 comments

This PR adds cpu-cycle-overrun1.txtar in gno/gno.land/cmd/gnoland/testdata . To run it, in gno/gno.land/cmd/gnoland enter:

go test -v . --run TestTestdata/cpu-cycle-overrun1

The call to Render hangs. You can see that the Render function is not very complicated, but some combination of avl.Tree and pointers seems to hang the GnoVM. We originally observed the problem in GnoSocial, but this code has been simplified as much as I can.

As explained in the file, if you comment out the final call to AddData, then Render returns quickly. So it is not a problem with too much data in memory. My wild guess is that there is indeed a CPU cycle overrun, but it hangs instead of quitting and returning the "CPU cycle overrun" status.

jefft0 avatar Mar 05 '24 14:03 jefft0

I'm still investigating, but my preliminary findings is that this is actually occurring while attempting to do Machine.String(). Generating a string value for an AVL node this big takes a truckload of time, up to where it looks like it's stuck (it might spit out something after a couple of hours). Though nothing to do with the actual VM cycles. Will provide more details as soon as I have them...

thehowl avatar Mar 08 '24 15:03 thehowl

Okay, as I was mentioning in the last comment this has to do with printing "very recursive" values like the avl trees you're working with.

If we apply this patch, which essentially forces a maximum depth of 8 when printing values, we get to see the "true" error:

diff --git a/gnovm/pkg/gnolang/values_string.go b/gnovm/pkg/gnolang/values_string.go
index 34187e32..86274b19 100644
--- a/gnovm/pkg/gnolang/values_string.go
+++ b/gnovm/pkg/gnolang/values_string.go
@@ -11,19 +11,22 @@ type protectedStringer interface {
 	ProtectedString(*seenValues) string
 }
 
-// This indicates the maximum ancticipated depth of the stack when printing a Value type.
-const defaultSeenValuesSize = 32
+const seenValuesSize = 8
 
 type seenValues struct {
-	values []Value
+	values [seenValuesSize]Value
+	len    int
 }
 
+func (sv *seenValues) Full() bool { return sv.len == seenValuesSize }
+
 func (sv *seenValues) Put(v Value) {
-	sv.values = append(sv.values, v)
+	sv.values[sv.len] = v
+	sv.len++
 }
 
 func (sv *seenValues) Contains(v Value) bool {
-	for _, vv := range sv.values {
+	for _, vv := range sv.values[:sv.len] {
 		if vv == v {
 			return true
 		}
@@ -41,11 +44,11 @@ func (sv *seenValues) Contains(v Value) bool {
 //     being printed if the value has already been print
 //   - this is NOT recursion and SHOULD be printed
 func (sv *seenValues) Pop() {
-	sv.values = sv.values[:len(sv.values)-1]
+	sv.len--
 }
 
 func newSeenValues() *seenValues {
-	return &seenValues{values: make([]Value, 0, defaultSeenValuesSize)}
+	return &seenValues{}
 }
 
 func (v StringValue) String() string {
@@ -72,6 +75,9 @@ func (av *ArrayValue) ProtectedString(seen *seenValues) string {
 	if seen.Contains(av) {
 		return fmt.Sprintf("%p", av)
 	}
+	if seen.Full() {
+		return "<trimmed>"
+	}
 
 	seen.Put(av)
 	defer seen.Pop()
@@ -104,6 +110,9 @@ func (sv *SliceValue) ProtectedString(seen *seenValues) string {
 	if seen.Contains(sv) {
 		return fmt.Sprintf("%p", sv)
 	}
+	if seen.Full() {
+		return "<trimmed>"
+	}
 
 	if ref, ok := sv.Base.(RefValue); ok {
 		return fmt.Sprintf("slice[%v]", ref)
@@ -134,6 +143,9 @@ func (pv PointerValue) ProtectedString(seen *seenValues) string {
 	if seen.Contains(pv) {
 		return fmt.Sprintf("%p", &pv)
 	}
+	if seen.Full() {
+		return "<trimmed>"
+	}
 
 	seen.Put(pv)
 	defer seen.Pop()
@@ -154,6 +166,9 @@ func (sv *StructValue) ProtectedString(seen *seenValues) string {
 	if seen.Contains(sv) {
 		return fmt.Sprintf("%p", sv)
 	}
+	if seen.Full() {
+		return "<trimmed>"
+	}
 
 	seen.Put(sv)
 	defer seen.Pop()
@@ -204,6 +219,9 @@ func (mv *MapValue) ProtectedString(seen *seenValues) string {
 	if seen.Contains(mv) {
 		return fmt.Sprintf("%p", mv)
 	}
+	if seen.Full() {
+		return "<trimmed>"
+	}
 
 	seen.Put(mv)
 	defer seen.Pop()
@@ -286,6 +304,9 @@ func (tv *TypedValue) ProtectedSprint(seen *seenValues, considerDeclaredType boo
 	if seen.Contains(tv.V) {
 		return fmt.Sprintf("%p", tv)
 	}
+	if seen.Full() {
+		return "<trimmed>"
+	}
 
 	// print declared type
 	if _, ok := tv.T.(*DeclaredType); ok && considerDeclaredType {

The "true" error:

        [stderr]
        "gnokey" error: allocation limit exceeded

        FAIL: testdata/cpu-cycle-overrun1.txtar:22: unexpected "gnokey" command failure: allocation limit exceeded

(The VM encounters this error, then attempts to dump the machine state for debugging into the error message, and thus "hangs" while trying to print the state).

As for the solution, I'm not terribly sure. What I've thought of:

  • This approach works for now, but is naive. Ultimately the underlying problem is that we're getting too much data to print out, both in the stringer but also because we're dumping each block, often containing redundant data, (ie. already contained and printed in other blocks), or contained in multiple variables, which together with the recursiveness blows up the sheer amount of data printed in Machine.String(), and as a consequence also allocated just to return the error.
  • Also, reaching a depth of 8, while not terribly common, is still possible even with a data structure that is not recursive like the avl.Tree (think of an AST tree)
  • So a better approach seems to have a limit of how many values can be printed when stringing a VM value (say 1000)
  • ... but in order for it to maintain effectiveness of dumped values as debugging tools, they should be printed in a "breadth-first" manner. (ie. print all items at depth=0, then depth=1, depth=2... until you reach the limit of items).

I'm not sure how/if the last part can be implemented in a nice and efficient way. cc/ @deelawn @mvertes @ajnavarro as you probably have ideas.

thehowl avatar Mar 08 '24 16:03 thehowl

@thehowl IMHO, Machine.String() output as it is right now, should not be used as a log output, even adding depth limitations.

If we really want to do what Machinge.String() is doing, we have to change the method to return a reader, avoiding using huge amounts of memory.

ajnavarro avatar Mar 14 '24 10:03 ajnavarro

Yeah.

What would be best is to implement a way to print an actual stacktrace (of the Gno code alone) so we can use that to debug in place of the machine state dump :/

thehowl avatar Mar 14 '24 14:03 thehowl

I reverted commit https://github.com/gnolang/gno/pull/1736/commits/2f78ae258c1315746f9a0c1d1bb05a8f18cd93bd and merged from master which now has the same fix from PR https://github.com/gnolang/gno/pull/1745 .

jefft0 avatar Mar 21 '24 09:03 jefft0

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 55.00%. Comparing base (1180def) to head (ff00c55).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1736      +/-   ##
==========================================
- Coverage   55.01%   55.00%   -0.01%     
==========================================
  Files         595      595              
  Lines       79727    79727              
==========================================
- Hits        43858    43856       -2     
- Misses      32550    32551       +1     
- Partials     3319     3320       +1     
Flag Coverage Δ
contribs/gnodev 26.00% <ø> (ø)
contribs/gnofaucet 14.46% <ø> (-0.86%) :arrow_down:
contribs/gnokeykc 0.00% <ø> (ø)
contribs/gnomd 0.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov[bot] avatar Mar 21 '24 09:03 codecov[bot]

From the review meeting: the agreed solution is to switch from doing Machine.String() to printing a stacktrace. (This is likely to be more useful and parseable information by the programmer anyway... :) )

thehowl avatar Mar 21 '24 18:03 thehowl