George Gensure

Results 166 comments of George Gensure

@80degreeswest is this something you can take a look at?

Buildfarm is currently intolerant to redis configuration changes. Redis clusters that change IPs/names are not reflected in server/worker grouped communication. Its something I've been meaning to experiment with tolerating, but...

I think I see - you're questioning the discrepancy between 'No available workers' and the threefold service availability on the workers. The problem needs to be looked at on the...

200 on the prometheus endpoint doesn't necessarily indicate 'fine', but I agree that there's no presentation difference on any public interface that a worker can't talk to the backplane if...

> I would love that worker would report that it is not healthy if it can not talk to backplane, in that case kubernetes would restart it. Without the below...

I'm not sure if we've exposed enough of the runtime metrics for those caches to make decisions from any externally observable measurements. Pathologically, you want that cache to be large...

I say, yes, reduce the size (of the caches). I bet that a good portion of them decay quickly (but again, I would need to have/add stats to say that...

Two reasons: Link count, enumerated here: https://bazelbuild.github.io/bazel-buildfarm/docs/architecture/content_addressable_storage/#casfilecache And the creation of lost+found directories at mount roots for ext4 filesystems, which a non-privileged worker will fail to start due to attempts...

You should find that this works with a rebase.

There are no timeouts for this per se - these essentially represent operations in flight that are waiting for appropriate workers to pull them off the mentioned queue. I'll turn...