Edward Capriolo
Edward Capriolo
If you take VLLM they have a shared KVcache. Users are encourage to set a cache_salt if they want to ensure people cant "guess prompts" by looking at the timings...
Take a look at this. "cacheSalt". The idea here is in multi-user envs I can "guess other prompts, by looking at the timings of the response. https://github.com/edwardcapriolo/deliverance/pull/6 We all share...
A further improvement dedicated kvcache: https://github.com/edwardcapriolo/deliverance/pull/new/dedicated_kv
I looked this over quickly and I have a hunch. The KVcache and the batch forwarding might maybe the result non deterministic. ``` public float[] embed(String input, PoolingType poolingType) {...
https://github.com/edwardcapriolo/deliverance/pull/13/files I had not gotten to add embedding to my fork so I started on it. One thing I notice ``` try (AbstractTensor r = batchForward(encoded, 0, kvMem)){ if (poolingType...
Also this seems dubious ``` VectorMath.pfor(0, config.embeddingLength, i -> { // BERT seems to use tanh for pooling rather than gelu outputEmbedding[i] = ActivationFunction.eval(ActivationFunction.Type.TANH, pooled.get(0, i)); }); return outputEmbedding; ```...
I wanted to catch you up on what i have been working on. I started by adding some fractional logging all over the pipeline. This is still a bit trick...
@udaychandra I can use your help here as my understanding of fundamentals is a bit weak. https://github.com/edwardcapriolo/deliverance/pull/new/layer I decided to see if I could assert that LayerNorm java is close...
@tjake I cant imagine the failed test have anything to do with the PR
2025-07-28T10:33:32.703+05:30 WARN 17160 --- [excel] [ main] c.g.t.j.t.o.TensorOperationsProvider : Native operations not available. Consider adding 'com.github.tjake:jlama-native' to the classpath If you are seeing this something is no right. The shared...