syzkaller
syzkaller copied to clipboard
prog: `panic: no result` during serialization
For quite a while now my syz-manager instances stopped running for long. Very soon they end up with an error and terminate (see below).
Let me know what kind of information I can provide to debug this. (go version: go1.22.2)
panic: no result
goroutine 519346 [running]:
github.com/google/syzkaller/prog.(*ResultArg).serialize(0xc0338054d0, 0xc02d7700f0)
/home/cpaasch/syzkaller/prog/encoding.go:224 +0x2a5
github.com/google/syzkaller/prog.(*serializer).arg(0x19a07e0?, {0x19adaa0?, 0xc0338054d0?})
/home/cpaasch/syzkaller/prog/encoding.go:121 +0x29
github.com/google/syzkaller/prog.(*serializer).call(0xc02d7700f0, 0xc0325d0af0)
/home/cpaasch/syzkaller/prog/encoding.go:82 +0x211
github.com/google/syzkaller/prog.(*Prog).serialize(0xc005e511c0, 0x0)
/home/cpaasch/syzkaller/prog/encoding.go:46 +0xd8
github.com/google/syzkaller/prog.(*Prog).Serialize(...)
/home/cpaasch/syzkaller/prog/encoding.go:30
main.(*Runner).logProgram(0xc01deef810, 0x4, 0xc005e511c0)
/home/cpaasch/syzkaller/syz-manager/rpc.go:455 +0x52
main.(*RPCServer).StartExecuting(0xc0004fa000, 0xc02d737ec0, 0x0?)
/home/cpaasch/syzkaller/syz-manager/rpc.go:251 +0x1c5
reflect.Value.call({0xc000586780?, 0xc002302dd0?, 0xc013d32c40?}, {0x14deee0, 0x4}, {0xc00aa3aef8, 0x3, 0xc0134ddce0?})
/usr/local/go/src/reflect/value.go:596 +0xca6
reflect.Value.Call({0xc000586780?, 0xc002302dd0?, 0xc034a4b320?}, {0xc00aa3aef8?, 0x0?, 0x0?})
/usr/local/go/src/reflect/value.go:380 +0xb9
net/rpc.(*service).call(0xc00270c940, 0xc00270c900, 0xc0288912b8, 0xc0288912d0, 0xc0003a6c80, 0xc024106120, {0x11123a0?, 0xc02d737ec0?, 0xc00af75230?}, {0x1114a20, ...}, ...)
/usr/local/go/src/net/rpc/server.go:381 +0x20e
created by net/rpc.(*Server).ServeCodec in goroutine 422942
/usr/local/go/src/net/rpc/server.go:478 +0x3d7
Hi Christoph,
How frequently/soon does it crash? I don't remember seeing anything similar on syzbot nor locally. Do you have any local modifications? What syzkaller commit are you using? Recently this piece of logic was moved from the VM to the syz-manager, so potentially previously your VMs were crashing now and then with this error, but now syz-manager process started crashing.
Hi @dvyukov !
How frequently/soon does it crash
I would say within less than 24 hours.
Do you have any local modifications
No modifications.
What syzkaller commit are you using
I am on 8bdc0f220628c9347b3581fead4c026272439799 right now.
We do this trick to dump more info when decoding programs: https://github.com/google/syzkaller/blob/master/prog/encoding.go#L252-L257
But it won't work for encoding, since if encoding panics we cannot encode to dump it :)
Maybe if you flip this var to true, it may give some useful info earlier:
prog/validation.go:var debug = false // enabled in tests and fuzzers
With this flag program manipulation will do more checks every now and then.
Been running with that suggested change for a while now and I'm no longer seeing the crash reported above. What I AM seeing instead is a lot of this:
2024/05/02 18:27:28 vm-12007: crash: panic: call #NUM mmap: ptr vma has bad address ADDR/NUM/NUM
I'll keep this running over night to see if that is really the difference in behavior or just another artifact of the debug variable being set.
What's in the crash log files for these? They should contain unmasked values for these NUM/ADDR + maybe some additional info (stack, programs).
Hi @dvyukov !
How frequently/soon does it crash
I would say within less than 24 hours.
Do you have any local modifications
No modifications.
What syzkaller commit are you using
I am on 8bdc0f2 right now.
It's really interesting that we don't see this happen on syzbot and when we run syzkaller locally. If it's easily reproducible on your side, I wonder what we are doing differently.
I think it would be great to find some minimized setup (Linux kernel revision + config, syzkaller revision + config, host and target arches) that would let us reproduce the problem too.
It also may be some bad programs in the corpus.
╰─○ cat log0 Warning: Permanently added '[localhost]:39534' (ED25519) to the list of known hosts. 2024/05/02 19:03:36 fuzzer started 2024/05/02 19:03:36 dialing manager at localhost:40649 2024/05/02 19:03:36 checking machine... 2024/05/02 19:03:36 testing simple program... syzkaller login: [ 17.819672] cgroup: Unknown subsys name 'net' [ 17.977755] cgroup: Unknown subsys name 'rlimit' [ 18.172610] Adding 124996k swap on ./swap-file. Priority:0 extents:1 across:124996k FS panic: call #0 mmap: ptr vma has bad address 18446744073709547520/4096/4096 goroutine 1 [running]: github.com/google/syzkaller/prog.(*Prog).debugValidate(...) /home/ubuntu/src/syzkaller/prog/validation.go:19 github.com/google/syzkaller/prog.(*Prog).SerializeForExec(0xc00110f8c0) /home/ubuntu/src/syzkaller/prog/encodingexec.go:69 +0x3d5 github.com/google/syzkaller/pkg/ipc.(*Env).Exec(0xc000560000?, 0x0?, 0xdc0798?) /home/ubuntu/src/syzkaller/pkg/ipc/ipc.go:339 +0x25 main.checkSimpleProgram(0xc00110feb8, 0x0?) /home/ubuntu/src/syzkaller/syz-fuzzer/testing.go:258 +0x21e main.checkMachine(0xc00110feb8) /home/ubuntu/src/syzkaller/syz-fuzzer/testing.go:148 +0x265 main.main() /home/ubuntu/src/syzkaller/syz-fuzzer/fuzzer.go:156 +0xa28
It's a separate problem from the one that @cpaasch described and I remember seeing it too when I ran syz-manger with prog validation enabled. We use negative vma addresses for our test program to emulate protected memory areas around the data buffer, and the validator is not particularly happy about this.
Sorry. Should I move this to a separate issue or do we have a fix?
Yes, let's create a separate issue for that. To not let it stop debugging of the original problem, does it work if you remove the following line and recompile syzkaller?
https://github.com/google/syzkaller/blob/ddfc15a1555eaea52bb3186b2c53245cc91ac5d2/sys/targets/common.go#L43
Moving my posts to https://github.com/google/syzkaller/issues/4750
Yes, let's create a separate issue for that. To not let it stop debugging of the original problem, does it work if you remove the following line and recompile syzkaller?
https://github.com/google/syzkaller/blob/ddfc15a1555eaea52bb3186b2c53245cc91ac5d2/sys/targets/common.go#L43
Strange this wasn't caught in tests. Tests should enable debug=true. Would be good to check and fix as well.
I think I got an instrumented backtrace:
panic: bad group arg size 32, should be <= 24 for &prog.GroupArg{ArgCommon:prog.ArgCommon{ref:0x2bb, dir:0x0}, Inner:[]prog.Arg{(*prog.UnionArg)(0xc094edf530)}} type "array"
goroutine 12512734 [running]:
github.com/google/syzkaller/prog.foreachArgImpl({0x1971880?, 0xc001fd5160?}, 0xc04b547310, 0xc07cbf7380)
/home/ubuntu/src/syzkaller/prog/analysis.go:192 +0x774
github.com/google/syzkaller/prog.foreachArgImpl({0x1971880?, 0xc001fd5120?}, 0xc04b547310, 0xc07cbf7380)
/home/ubuntu/src/syzkaller/prog/analysis.go:181 +0x4ce
github.com/google/syzkaller/prog.foreachArgImpl({0x19716c0?, 0xc094edf440?}, 0xc04b547310, 0xc07cbf7380)
/home/ubuntu/src/syzkaller/prog/analysis.go:200 +0x3d8
github.com/google/syzkaller/prog.foreachArgImpl({0x1971880?, 0xc001fd50e0?}, 0xc04b547310, 0xc07cbf7380)
/home/ubuntu/src/syzkaller/prog/analysis.go:181 +0x4ce
github.com/google/syzkaller/prog.foreachArgImpl({0x19716c0?, 0xc094edf410?}, 0xc04b547310, 0xc07cbf7380)
/home/ubuntu/src/syzkaller/prog/analysis.go:200 +0x3d8
github.com/google/syzkaller/prog.foreachArgImpl({0x1971880?, 0xc001fd50c0?}, 0xc04b547310, 0xc07cbf7380)
/home/ubuntu/src/syzkaller/prog/analysis.go:181 +0x4ce
github.com/google/syzkaller/prog.foreachArgImpl({0x19716c0?, 0xc094edf3b0?}, 0xc04b547310, 0xc07cbf7380)
/home/ubuntu/src/syzkaller/prog/analysis.go:200 +0x3d8
github.com/google/syzkaller/prog.foreachSubArgWithStack(...)
/home/ubuntu/src/syzkaller/prog/analysis.go:139
github.com/google/syzkaller/prog.forEachStaleUnion(0xc0005bf4a0, 0xc0b7a2df90, 0xc07cbf73d8)
/home/ubuntu/src/syzkaller/prog/expr.go:105 +0x56
github.com/google/syzkaller/prog.(*Call).checkConditions(0xc06ccc6140?, 0x1971680?, 0x50?)
/home/ubuntu/src/syzkaller/prog/expr.go:178 +0x58
github.com/google/syzkaller/prog.(*validCtx).validateCall(0xc06ccc6140, 0xc0b7a2df90)
/home/ubuntu/src/syzkaller/prog/validation.go:79 +0x185
github.com/google/syzkaller/prog.(*Prog).validateWithOpts(0xc07d4293c0, {0x10?, 0x37?})
/home/ubuntu/src/syzkaller/prog/validation.go:51 +0x115
github.com/google/syzkaller/prog.(*Prog).validate(...)
/home/ubuntu/src/syzkaller/prog/validation.go:25
github.com/google/syzkaller/prog.(*Prog).debugValidate(...)
/home/ubuntu/src/syzkaller/prog/validation.go:18
github.com/google/syzkaller/prog.(*Prog).serialize(0xc07d4293c0, 0x0)
/home/ubuntu/src/syzkaller/prog/encoding.go:38 +0x2e
github.com/google/syzkaller/prog.(*Prog).Serialize(...)
/home/ubuntu/src/syzkaller/prog/encoding.go:30
main.(*Runner).logProgram(0xc0aa19fea0, 0xc068dddec0?, 0x5e7?)
/home/ubuntu/src/syzkaller/syz-manager/rpc.go:552 +0x49
main.(*RPCServer).StartExecuting(0xc0005b6000, 0xc05dcaa660, 0x1?)
/home/ubuntu/src/syzkaller/syz-manager/rpc.go:309 +0x1c5
reflect.Value.call({0xc0001927e0?, 0xc00013c8b0?, 0x0?}, {0x14ad9a6, 0x4}, {0xc087339ef8, 0x3, 0x0?})
/home/ubuntu/src/go/src/reflect/value.go:596 +0xce7
reflect.Value.Call({0xc0001927e0?, 0xc00013c8b0?, 0x2?}, {0xc087339ef8?, 0x196c968?, 0x0?})
/home/ubuntu/src/go/src/reflect/value.go:380 +0xb9
net/rpc.(*service).call(0xc0005e6b40, 0xc02ac16001?, 0xc087339fd0?, 0xc01f1592a0, 0xc0006d2080, 0xc00518c1e0?, {0x10e4a40?, 0xc05dcaa660?, 0xc0442e4580?}, {0x10e7000, ...}, ...)
/home/ubuntu/src/go/src/net/rpc/server.go:382 +0x214
created by net/rpc.(*Server).ServeCodec in goroutine 12431003
/home/ubuntu/src/go/src/net/rpc/server.go:479 +0x410
Then again, 6 minutes later I get the backtrace from the top of this issue, though I've been running with the debug config as suggested. What would we expect to see with debugging on?
Then again, 6 minutes later I get the backtrace from the top of this issue, though I've been running with the
debugconfig as suggested. What would we expect to see with debugging on?
No concrete expectation, it was just a simple way to try to get something more useful.
I've tried to run prog package test for tens of CPU hours in a loop, but no success so far. Trying to run syz-manager with debug enabled.
No progress with --debug so far. I see it's only running a single VM despite my config asking for more. That might as well be a reason to not hit the problem. Anyways, I'll keep running for some more.
I've run syz-manager compiled with -race for a while and it has found the problem -- logProgram() races with the code that has been waiting for the execution to have been finished.
The problem is fixed in https://github.com/google/syzkaller/pull/4802.
Forgot to get back to this, but just wanted to thank you as my syzkaller instances were running fine again last week!