kepler
kepler copied to clipboard
Kepler is very slow to start
Describe the bug Before the PR #225, Kepler was taking 1.18ms to start. But now, it is taking 40.17s to start
Although It might be related to the model server, the default deployment should have the model server disabled...
To Reproduce Steps to reproduce the behavior: The PR #307 introduce a log with the start time elapsed time.
Expected behavior Kepler should start faster, specially if the model server is disabled....
/cc @sunya-ch
The current implementation there is no flag to omit getting initial weight by connecting model-server or loading model weight. We can just add that flag in the config and use it as a condition to skip this below function. https://github.com/sustainable-computing-io/kepler/blob/605dc9cf79d1e7e600ef6e0a468d964b73be6a72/pkg/collector/metrics.go#L108
I tried to submit a PR but comment ths line but got following (can check it and update somewhere) ,so what's the impact of remove such function? e.g estimate function not work at all?
panic: runtime error: index out of range [0] with length 0
goroutine 66 [running]:
github.com/sustainable-computing-io/kepler/pkg/collector.(*Collector).reader.func1()
/opt/app-root/src/github.com/sustainable-computing-io/kepler/pkg/collector/reader.go:310 +0x113e
created by github.com/sustainable-computing-io/kepler/pkg/collector.(*Collector).reader
/opt/app-root/src/github.com/sustainable-computing-io/kepler/pkg/collector/reader.go:256 +0x85
The impact is that there will be no initial model server to be downloaded and no estimate model will be applied. If no both node/RAPL power measured, it could cause that error. I will push another PR to fix that case by at least returns zeros.
The impact is that there will be no initial model server to be downloaded and no estimate model will be applied.
so for education purpose, no such model doesn't impact main metrics collect function so they can start kepler faster?
Yes. I should not affect the main metric collect function.
Closing because PR #316