Jack Gallagher
Jack Gallagher
usually with cosine-sim models I'd train with learned per-head scales for the attention logits, I guess I can get this from multiplying by `q` & `k` by `sqrt(scales)` before the...
the general case of attention is (using annotations from [jaxtyping](https://github.com/google/jaxtyping)) ```py q: Float["lq d"] k: Float["lkv d"] v: Float["lkv o"] mask: Bool["lq lkv"] returns: Float["lq o"] ``` but it looks...
it's pretty verbose to write out `--cluster $CLUSTER --workload $WORKLOAD` every time. I'd like to just write `xpk workload delete $CLUSTER $WORKLOAD` or even just `xpk delete $WORKLOAD`
Mobile adds desktop: ``` desktop: generate keypair, render public key to QR code mobile: read QR code, sign public key, send signature to server ``` Desktop adds mobile: ``` desktop:...
Should make it essentially impossible for Eve to infer metadata without breaking TLS.
for example: - limit number of kdf iterations for out of order message delivery - limit max message size - patch `serde_cbor` to limit maximum memory allocation I'm not sure...
This would decrease both the reliability and bandwidth load on the server. GC strategy for blocks can be slower than for keys, so that'll help as well.
what queries can the user make? some obvious ones: - `FetchKey(of: UserId, dev: DeviceId)` - `ValidKeys(of: UserId)` - `AllKeys(of: UserId)` Responses to these are fairly obvious, but I want to...