semian icon indicating copy to clipboard operation
semian copied to clipboard

OS X support

Open haileys opened this issue 10 years ago • 20 comments

This pull request adds OS X support to Semian.

The biggest potentially controversial change here is that I've removed the ticket acquisition timeout entirely. This is because OS X doesn't have semtimedop. I asked @Sirupsen about this and he backgrounded me on the original motivation for this timeout and why it isn't strictly necessary anymore.

Because we no longer block on semaphore operations, I was also able to remove the GVL release code.

cc @Sirupsen @csfrancis @byroot

haileys avatar Sep 17 '15 05:09 haileys

this timeout and why it isn't strictly necessary anymore.

Yes. We recently removed all usage of it. So it's fine. We'll have to do a major version bump though.

Flagging this for later review even though @csfrancis is the authoritative person here.

Also you have a CI failure that seems legit.

byroot avatar Sep 17 '15 05:09 byroot

The other thing I discussed with @Sirupsen is the use of SHA1 to generate Semaphore keys from strings. There's a slight potential for this to generate the same key for multiple resources, or even between totally different applications that both happen to use SysV semaphores.

Is there any reason why Semian uses SHA1 for this and not ftok? ftok generates a key from a file path and guarantees uniqueness this way, rather than SHA1's probabilistic uniqueness.

haileys avatar Sep 17 '15 05:09 haileys

@byroot Pushed up f7372fb which fixes that failure. Turns out that test was implicitly relying on timeouts (as it tried to acquire the resource immediately after sending SIGKILL to the child process that had already acquired the resource). I've added a call to Process.waitpid after Process.kill so we only try to acquire the resource after the child process has died.

haileys avatar Sep 17 '15 05:09 haileys

The argument I have for not requiring a ticket count I tried to explain to Charlie last night is that the equation for a ticket count's effectiveness effectively looks like this: p^t = q, where p is probability of sampling a worker talking to the resource that has ticket count t, an q is the accepted false positive rate for sampling t workers talking to the resource. When t grows beyond 6-8 because of this exponential relationship a timeout has a very questionable value, and makes it harder to argue about an effective ticket count. A t of, say, 8 is as appropriate for 15 workers as it is for 80 workers (this means that it's better, for Semian, to have more workers on the same box as you reduce capacity less during a failure in that case). The reason we introduced the timeout was that we ran SysV namespaces with only 4 workers, which meant that we couldn't go beyond a ticket count of 2 and thus needed another mechanism to tune. However, we no longer do this and I don't anticipate us doing this again—and this is why I think it's absolutely fine to remove semtimedop in favour of never having a timeout.

sirupsen avatar Sep 17 '15 13:09 sirupsen

I've put it on my list to test this PR in production. I'll attempt to do that next week. Nothing really jumps out, thanks @charliesome :)

sirupsen avatar Sep 17 '15 17:09 sirupsen

I tested this branch with shopify in dev and test. Seems all green. We can try it out on production servers once the last concerns are settled.

byroot avatar Sep 18 '15 03:09 byroot

Since this pull request will already require a major version bump, I'd like to discuss another breaking change that we could make while we're at it.

IMO we should use ftok rather than SHA1 to generate semaphore keys. ftok returns a key which is generated from a file's identity on disk rather than just being a simple digest. This guarantees that there won't be clashes between two semaphores. While clashes are fairly unlikely with SHA1, (1 in 4 billion with a 32 bit key_t), a guarantee that a clash won't happen is even better :grin:

Using ftok requires that we have a file on disk per resource that is visible to all workers. We could just use /tmp/semian.#{resource_name}.sem or something, but this feels a little sketchy and makes more assumptions about environments that Semian will be used in than I'm comfortable making - it assumes that /tmp exists and is writable (in this brave new world of containerising everything, this may not be a reasonable assumption). It also assumes that the all processes sharing an IPC namespace also share the same root directories and mount namespaces - again, not necessarily a reasonable assumption these days.

I think the ideal solution here is to add a parameter to Semian::Resource (and expose that up through Semian.register) that allows the user to customise the filename used for ftok. That could be something in /tmp by default, or it could be a required parameter.

haileys avatar Sep 18 '15 04:09 haileys

Using ftok requires that we have a file on disk per resource that is visible to all workers.

This (unless I'm missing something) is I'm afraid impossible for us since we're running our processes in multiple docker containers. We do share the IPC namespace between containers, but not the filesytem.

byroot avatar Sep 18 '15 05:09 byroot

Yeah, @byroot has a valid point for our deployment. I really don't want to do a shared mount between all the containers just for this :/

sirupsen avatar Sep 22 '15 00:09 sirupsen

While clashes are fairly unlikely with SHA1, (1 in 4 billion with a 32 bit key_t), a guarantee that a clash won't happen is even better

ftok isn't bulletproof either, unfortunately (from the BSD docs on OSX):

BUGS The returned key is computed based on the device minor number and inode of the specified path, in combination with the lower 8 bits of the given id. Thus, it is quite possible for the routine to return duplicate keys.

csfrancis avatar Sep 22 '15 00:09 csfrancis

Perhaps we could do something like store a mapping of all active resource_id -> sem_ids in a hash to detect collisions? It seems unlikely that we would encounter one, but if we did, the user would definitely want to know about it.

csfrancis avatar Sep 22 '15 00:09 csfrancis

I think you are going to far guys. We're talking about hashing human identifiers. Even MD5 would do. The risk of accidental collisions is extremely low.

byroot avatar Sep 22 '15 00:09 byroot

ftok isn't bulletproof either, unfortunately

Ugh, I missed that. That's a shame. I guess SHA1 is good enough in that case then

haileys avatar Sep 22 '15 03:09 haileys

I'd question again why we even need this. Why can't we just disable this in test and dev? @charliesome

sirupsen avatar Nov 11 '15 16:11 sirupsen

@Sirupsen because then you can't run your integration tests on OSX. :sad_panda:

byroot avatar Nov 11 '15 16:11 byroot

Yeah, but those don't fork etc., so you should be able to use the in-memory adapter, no?

Unless you mean the Semian test suite.

sirupsen avatar Nov 11 '15 16:11 sirupsen

Oh. Well, if we end up having an in-memory semaphore, then yes.

byroot avatar Nov 11 '15 16:11 byroot

any updates on this by chance? this came up to test certain semian details in development mode in osx

Soleone avatar May 24 '16 19:05 Soleone

@sirupsen @charliesome This looks like an interesting PR but is about a year and a half old... any reason it's still open? looks like it's targeted for only Linux in prod, so it could be closed with 'make sure your dev environment is linux; we don't expect support for BSD/OSX'?

bf4 avatar May 28 '17 02:05 bf4

@bf4 we're still interested in this work upstream, someone just needs to own it :)

sirupsen avatar May 28 '17 12:05 sirupsen