Reduce timing overhead
Addressing #270.
Couple of approaches.
If CLOCK_MONOTONIC_COARSE is available on the system and the desired profiling interval is slow enough, that timer is used. This timer is very low overhead, as it just accesses a variable in user-space, and never makes a syscall.
Alternatively, if the user requests it, the profiler will run a timing thread. This is a C thread that runs outside the GIL and updates a variable with the current time every <interval> seconds.
I think I've figured out the ctypes thing. It seems that loading the .so file with the path works on both linux and mac.
Also potentially addressing https://github.com/joerick/pyinstrument/issues/83