optimus
optimus copied to clipboard
Revamp Implementation to Use Proper Pointer
TL;DR
This issue proposes to revamp the current implementation to use proper pointer whever possible instead of value to address possible issues caused by side effect and memory consumption.
Background
The current implementations mostly use values as one way to pass information from one place to another rather than pointers. Using value provides its own advantages, one of which is the less worry of unnecessary side effects. However, in the context of the current implementations, there could be at least two possible issues arising, side effects and memory consumption.
Side Effect
Using value as a way to pass information from one place to another could reduce the possibility of unnecessary side effects. However, that only works if all the values we pass are indeed a "value" and indeed is treated as value. The story will be different if we didn't realize that a member of our variable is not a value, but a pointer instead.
Simple Example
Let's say we have a code like the following:
...
type Job struct {
ID string
Name string
Description string
}
func doSideEffectOnJob(job Job) {
job.Description = "Updated description by side effect"
}
func main() {
job := Job{
ID: "e3",
Name: "user_job",
Description: "A job that queries to fill table user",
}
fmt.Println(job)
doSideEffectOnJob(job)
fmt.Println(job)
}
If we run the above example, we can have results like the following:
{e3 user_job A job that queries to fill table user}
{e3 user_job A job that queries to fill table user}
Now, if we add a pointer member to the Job
type and pass it as value and do some side effect to that member, we can get a result with side effect.
For Job
type:
type Job struct {
...
Labels map[string]string
}
For side effect function:
func doSideEffectOnJob(job Job) {
...
job.Labels["side_effect"] = "affected"
}
For main function:
...
func main() {
job := Job{
...
Labels: map[string]string{
"orchestrator": "optimus",
},
}
fmt.Println(job)
doSideEffectOnJob(job)
fmt.Println(job)
}
If we run it, we will get a side effect even if we pass it as value.
{e3 user_job A job that queries to fill table user map[orchestrator:optimus]}
{e3 user_job A job that queries to fill table user map[orchestrator:optimus side_effect:affected]}
Real Example
If we look at the current implementation (this commit), we can see that we have a method for Resolve
:
func (*dependencyResolver) Resolve(..., jobSpec models.JobSpec, ...) (models.JobSpec, error) {
...
}
One point to note on this method is models.JobSpec
, whic is used as parameter and return. If we read Resolve
method that accepts a models.JobSpec
and returns also a models.JobSpec
, we might assume that the models.JobSpec
returned is the resolved spec withouth having to worry models.JobSpec
that we sent as parameter, as it is being sent as value.
However, the tricky part here is, we have to worry about models.JobSpec
that we sent as well even though we sent it as value. The reason is, Resolve
method, in fact, also do some side effects to the models.JobSpec
in the parameter, as illustrated by the example above. This is quite dangerous, as the caller might miss it and ignore.
Memory Consumption
In the context of memory consumption, sending variable as value can be problematic if it happens in large number of frequency and volume. In Optimus, we have a lot of asynchronous operations. If we have a lot of jobs and/or resources to be processed at the same time, memory consumption needs to be taken care of. To illustrate, the below is examples on this.
Single Job
Let's say we have the following code:
...
func main() {
job := Job{
ID: "e3",
}
fmt.Printf("from main: %p\n", &job)
checkSinglekMemory(job)
fmt.Printf("from main: %p\n", &job)
}
func checkSinglekMemory(job Job) {
fmt.Printf("inside call: %p\n", &job)
}
If we run the above code, we can ge results similar to the following:
from main: 0xc000126040
inside call: 0xc000126080
from main: 0xc000126040
From here, when we call the checkSinglekMemory
, the job we have in main
is still there and inside checkSinglekMemory
is a new copy of that job. The job in main
is still there, even though we are currently inside checkSinglekMemory
. Here, the data is duplicated and if there's asynchronous operation where we sent each job from a slice to a function, those data will also be duplicated. Data duplication means more memory consumption.
Multi Job
Let's say we have a code like the following:
...
func main() {
jobs := []Job{
{
ID: "job1",
},
{
ID: "job2",
},
}
fmt.Printf("from main: %p\n\n", &jobs)
checkMultiFlatMemory(jobs)
fmt.Println()
checkMultiNestedMemory(jobs)
}
func checkMultiFlatMemory(jobs []Job) {
fmt.Printf("from flat: %p\n", &jobs)
for i, job := range jobs {
fmt.Printf("inside loop i-%d: %p\n", i, &job)
}
}
func checkMultiNestedMemory(jobs []Job) {
fmt.Printf("from nested: %p\n", &jobs)
for i := 0; i < 2; i++ {
for j, job := range jobs {
fmt.Printf("inside i-%d loop j-%d: %p\n", i, j, &job)
}
}
}
If we run the code above, we might get similar result as the following:
from main: 0xc0000a4030
from flat: 0xc0000a4048
inside loop i-0: 0xc0000b0040
inside loop i-1: 0xc0000b0040
from nested: 0xc0000a4060
inside i-0 loop j-0: 0xc0000b0080
inside i-0 loop j-1: 0xc0000b0080
inside i-1 loop j-0: 0xc0000b00c0
inside i-1 loop j-1: 0xc0000b00c0
Such data duplication also arise even inside a loop and/or nested loop.
Proposal
To address the possible concerns mentioned above, this issue proposes to use proper pointer whenever possible instead of value. In order to apply this, we need to keep in mind considerations about the concerns previously mentioned.
Considertion: Side Effect
Using pointer has more probability to side effect. However, if the code is clean enough, such probability could be minimal to none.
For example, if we have a function:
func validateJob(*models.JobSpec) error {}
the caller can be at ease that this function won't do anything to cause side effect. But, there might be a question, what if that function indeed cause side effect? Then, the problem is in the function itself. Why would the function cause side effect? Does it reflect the name of the function? If the name of the function doesn't imply any side effect, then why would it need be?
It will be different if the function name is:
func enrichJob(*models.JobSpec) error {}
In the second function, we can be sure that there will be a modification to the job and the caller should be prepared for it.
In relation to the current implementation, rather than:
func (*dependencyResolver) Resolve(..., jobSpec models.JobSpec, ...) (models.JobSpec, error) {
...
}
it would be clearer and cleaner if we can change it to:
func (*dependencyResolver) Resolve(..., jobSpec *models.JobSpec, ...) error {
...
// do some side effect
}
or
func (*dependencyResolver) GetResolvedJob(..., jobSpec *models.JobSpec, ...) (*models.JobSpec, error) {
...
// making sure no side effect exist, while returning a new job spec
}
Consideration: Memory Consumption
When we use pointer to pass around data, we can be sure that what we pass is not a copy of what we sent, but instead an address to that data. So, even if there are a lot of asynchronous operations, what we actually sent around is the memory address, not a copy. With this, we can minimize the memory consumption.
As part of this issue we can focus on using JobSpec by pointer.