singularity

Results 27 comments of singularity

> Some explanation: draft_time is the time for one draft model's forward pass. target_time is the time for one draft model's forward pass corresponding to the valid budget. Is there...

Hi, I'm not entirely sure how GQA or other implementations affect the use of GPU memory, could you please elaborate? Generally, the formula is `max_total_token_num = (total_free_gpu_memory - model_parameter_size) *...

From my understanding of the paper mentioned above, GQA reduces the `kv_cache_size` by `num_attention_heads / num_key_value_heads` times. These values are available from `config.json` so the value of `kv_cache_size` can always...

This PR has been updated with changes to how `kv_cache_size` is calculated. Please review.

由于服务器并未返回具体的错误信息,很难确定此问题的原因所在。注意到现在小程序发送预约请求的时候会带上一个`text_`字段,对应于每个预约项目的`rsaText`字段,怀疑可能是近期添加了某种加密以针对脚本。

如果是前端加密,那可以考虑采用 Selenium,但这意味着现有代码需要推倒重来。

I filed this as an issue in https://github.com/flutter/flutter/issues/120763