Edward Capriolo

Results 51 comments of Edward Capriolo

If you take VLLM they have a shared KVcache. Users are encourage to set a cache_salt if they want to ensure people cant "guess prompts" by looking at the timings...

Take a look at this. "cacheSalt". The idea here is in multi-user envs I can "guess other prompts, by looking at the timings of the response. https://github.com/edwardcapriolo/deliverance/pull/6 We all share...

A further improvement dedicated kvcache: https://github.com/edwardcapriolo/deliverance/pull/new/dedicated_kv

I looked this over quickly and I have a hunch. The KVcache and the batch forwarding might maybe the result non deterministic. ``` public float[] embed(String input, PoolingType poolingType) {...

https://github.com/edwardcapriolo/deliverance/pull/13/files I had not gotten to add embedding to my fork so I started on it. One thing I notice ``` try (AbstractTensor r = batchForward(encoded, 0, kvMem)){ if (poolingType...

Also this seems dubious ``` VectorMath.pfor(0, config.embeddingLength, i -> { // BERT seems to use tanh for pooling rather than gelu outputEmbedding[i] = ActivationFunction.eval(ActivationFunction.Type.TANH, pooled.get(0, i)); }); return outputEmbedding; ```...

I wanted to catch you up on what i have been working on. I started by adding some fractional logging all over the pipeline. This is still a bit trick...

@udaychandra I can use your help here as my understanding of fundamentals is a bit weak. https://github.com/edwardcapriolo/deliverance/pull/new/layer I decided to see if I could assert that LayerNorm java is close...

@tjake I cant imagine the failed test have anything to do with the PR

2025-07-28T10:33:32.703+05:30 WARN 17160 --- [excel] [ main] c.g.t.j.t.o.TensorOperationsProvider : Native operations not available. Consider adding 'com.github.tjake:jlama-native' to the classpath If you are seeing this something is no right. The shared...