dettrace
dettrace copied to clipboard
timer_create: DT fails to write virtualized timerid due to glibc wrapper
Timer_create support was added in this pr.
Currently, the simple timer_create_default.c sample program exhibits non-portable behavior between different linux environments. For example, producing the following two outputs:
< created timerid 0x555555757260
---
> created timerid 0x602260
The former on local linux boxes (dockerized or not) and VMs, and the latter on Azure Devops. In both cases the numbers are deterministic from run-to-run but not portable.
This is not a time measurement, but the (process) unique ID of the timer itself.
We keep an unordered map in the state, timerCreateTimers,
timerCreateTimers = std::make_shared<unordered_map<timerID_t, timerInfo>>();
and read it here:
timerID_t timerid = s.timerCreateTimers.get()->size();
s.timerCreateTimers.get()->insert({timerid, ti});
It looks like we are intending to use monotonically increasing timerids: 0, 1, 2 ...
Ok, if I'm interpreting this right there may be a bug here in terms of WHERE we write the timerid. I added an extra print from the tracer to show which address we're poking, and then from the tracee:
[4]INFO 1fc writing timerid to 0x7fffffffeb1c
...
created timerid 0x555555757260, residing at address 0x7fffffffeba8, size 8
Well, those two 0x7ffff stack addresses are 140 bytes apart. We are doing a 64 bit mutation of the tracee in the timerCreate prehook... but this makes it seem like we're missing somehow?
But then in the later message printed from the signal handler, the timerid is correctly set to a small number (1). @devietti, it's been 18 months, but do you remember writing this code?
I looked over the code and nothing stands out as clearly wrong. It seems like the libc interface is the same as the syscall interface (timer_t* passed in 3rd argument).
Is the code with the updated printouts (which shows the address discrepancy) available somewhere? I can take a look.
I believe the issue is that glibc creates some kind of thin wrapper around certain syscalls, if you do syscall(SYS_timer_create, ...) there won’t be any issue as I tested. Same applies for syscalls like sigaction
Ah, very interesting @wangbj!
| Is the code with the updated printouts (which shows the address discrepancy) available somewhere? I can take a look.
@devietti - to view the discrepancy in addresses a --debug 5 run should be sufficient, i.e to print the message INFO 1fc writing timerid line which shows the discrepancy with the address printed by the sampleProgram itself.
I don't fully understand the implications of @wangbj's discovery. Is this right?
- the tracer is writing the timerid in the wrong place, and it is clobbering guest memory on the stack, but not in a way that is visible?
- but why is the resulting timerid (1) deterministic, (2) a big/garbage number
0x602260
If we were completely failing to set the timerid, then wouldn't the underlying, nondeterministic one show up? So it seems we are setting it, but we are setting it to (deterministic) garbage?
I don't know that this test would expose nondeterminism in the timer_id. It seems like the id might just be an address in some table (maybe in the kernel?). Maybe if you have threads creating timers concurrently then you'd see nondeterminism in the ids, but not with this simple test.
Well, running it outside of dettrace generates pretty consistently different/random outputs:
created timerid 0x55cf57738260
created timerid 0x55e3688a0260
created timerid 0x55ed31723260
However... I think you're right in that dettrace is probably failing to write this ID, but it happens to be deterministic simply because of disabling ASAN...
yeah, those seem like heap addresses so maybe that's where libc puts this table.
Always so many layers of renaming... but is this the right source in glibc?
int
__timer_create_old (clockid_t clock_id, struct sigevent *evp, int *timerid)
{
timer_t newp;
int res = __timer_create_new (clock_id, evp, &newp);
if (res == 0)
{
int i;
for (i = 0; i < OLD_TIMER_MAX; ++i)
if (__compat_timer_list[i] == NULL
&& ! atomic_compare_and_exchange_bool_acq (&__compat_timer_list[i],
newp, NULL))
{
*timerid = i;
break;
}
if (__glibc_unlikely (i == OLD_TIMER_MAX))
{
/* No free slot. */
(void) __timer_delete_new (newp);
__set_errno (EINVAL);
res = -1;
}
}
return res;
}
It looks like some compatibility bridge between old and new and it's consistent with what we're seeing. It's got that extra copy, newp on the stack....
But I don't see the real problem... if we're correctly emulating the syscall, this userspace code should just be any old random userspace code from our perspective...