Mark Winter
Mark Winter
How is the performance if you remove `torch.cuda.empty_cache()` ? You can also try using the batcher to increase gpu utilisation and rps https://kserve.github.io/website/0.8/modelserving/batcher/batcher/
For now I have just created a `VirtualService` but I think a configurable grafana endpoint seems more correct?
It would also be helpful if the DBs were configurable too `db/knative-serving-revision-cpu-and-memory-usage` `db/knative-serving-revision-http-requests`
@amybachir That looks like the same issue as you said. It occurs when you have a custom predictor It should be fixed when this PR is merged https://github.com/kserve/models-web-app/pull/7
@kimwnasptd Can you review this please?
@kimwnasptd Sorry for the long delay, this is now up to date with the latest master
@juliusvonkohout Looks like been a few changes so I will get to rebasing this one some point soon.
/retest
@sriharan16 I will work on this soon! I think there will probably be two separate features? 1. Edit a model server - This is to edit things like autoscaling settings,...
FYI I am working on this now. I will add an Edit button that loads the existing YAML which you can edit to change the `storageUri` and add the `canaryTrafficPercent`...