optimus icon indicating copy to clipboard operation
optimus copied to clipboard

Revamp Implementation to Use Proper Pointer

Open irainia opened this issue 2 years ago • 1 comments

TL;DR

This issue proposes to revamp the current implementation to use proper pointer whever possible instead of value to address possible issues caused by side effect and memory consumption.

Background

The current implementations mostly use values as one way to pass information from one place to another rather than pointers. Using value provides its own advantages, one of which is the less worry of unnecessary side effects. However, in the context of the current implementations, there could be at least two possible issues arising, side effects and memory consumption.

Side Effect

Using value as a way to pass information from one place to another could reduce the possibility of unnecessary side effects. However, that only works if all the values we pass are indeed a "value" and indeed is treated as value. The story will be different if we didn't realize that a member of our variable is not a value, but a pointer instead.

Simple Example

Let's say we have a code like the following:

...
type Job struct {
	ID          string
	Name        string
	Description string
}

func doSideEffectOnJob(job Job) {
	job.Description = "Updated description by side effect"
}

func main() {
	job := Job{
		ID:          "e3",
		Name:        "user_job",
		Description: "A job that queries to fill table user",
	}
	fmt.Println(job)
	doSideEffectOnJob(job)
	fmt.Println(job)
}

If we run the above example, we can have results like the following:

{e3 user_job A job that queries to fill table user}
{e3 user_job A job that queries to fill table user}

Now, if we add a pointer member to the Job type and pass it as value and do some side effect to that member, we can get a result with side effect.

For Job type:

type Job struct {
	...
        Labels      map[string]string
}

For side effect function:

func doSideEffectOnJob(job Job) {
        ...
	job.Labels["side_effect"] = "affected"
}

For main function:

...
func main() {
	job := Job{
                 ...
		Labels: map[string]string{
			"orchestrator": "optimus",
		},
	}
	fmt.Println(job)
	doSideEffectOnJob(job)
	fmt.Println(job)
}

If we run it, we will get a side effect even if we pass it as value.

{e3 user_job A job that queries to fill table user map[orchestrator:optimus]}
{e3 user_job A job that queries to fill table user map[orchestrator:optimus side_effect:affected]}

Real Example

If we look at the current implementation (this commit), we can see that we have a method for Resolve:

func (*dependencyResolver) Resolve(..., jobSpec models.JobSpec, ...) (models.JobSpec, error) {
    ...
}

One point to note on this method is models.JobSpec, whic is used as parameter and return. If we read Resolve method that accepts a models.JobSpec and returns also a models.JobSpec, we might assume that the models.JobSpec returned is the resolved spec withouth having to worry models.JobSpec that we sent as parameter, as it is being sent as value.

However, the tricky part here is, we have to worry about models.JobSpec that we sent as well even though we sent it as value. The reason is, Resolve method, in fact, also do some side effects to the models.JobSpec in the parameter, as illustrated by the example above. This is quite dangerous, as the caller might miss it and ignore.

Memory Consumption

In the context of memory consumption, sending variable as value can be problematic if it happens in large number of frequency and volume. In Optimus, we have a lot of asynchronous operations. If we have a lot of jobs and/or resources to be processed at the same time, memory consumption needs to be taken care of. To illustrate, the below is examples on this.

Single Job

Let's say we have the following code:

...
func main() {
    job := Job{
		ID: "e3",
	}
	fmt.Printf("from main: %p\n", &job)
	checkSinglekMemory(job)
	fmt.Printf("from main: %p\n", &job)
}

func checkSinglekMemory(job Job) {
	fmt.Printf("inside call: %p\n", &job)
}

If we run the above code, we can ge results similar to the following:

from main: 0xc000126040
inside call: 0xc000126080
from main: 0xc000126040

From here, when we call the checkSinglekMemory, the job we have in main is still there and inside checkSinglekMemory is a new copy of that job. The job in main is still there, even though we are currently inside checkSinglekMemory. Here, the data is duplicated and if there's asynchronous operation where we sent each job from a slice to a function, those data will also be duplicated. Data duplication means more memory consumption.

Multi Job

Let's say we have a code like the following:

...
func main() {
    jobs := []Job{
		{
			ID: "job1",
		},
		{
			ID: "job2",
		},
	}
	fmt.Printf("from main: %p\n\n", &jobs)
	checkMultiFlatMemory(jobs)
	fmt.Println()
	checkMultiNestedMemory(jobs)
}

func checkMultiFlatMemory(jobs []Job) {
	fmt.Printf("from flat: %p\n", &jobs)
	for i, job := range jobs {
		fmt.Printf("inside loop i-%d: %p\n", i, &job)
	}
}

func checkMultiNestedMemory(jobs []Job) {
	fmt.Printf("from nested: %p\n", &jobs)
	for i := 0; i < 2; i++ {
		for j, job := range jobs {
			fmt.Printf("inside i-%d loop j-%d: %p\n", i, j, &job)
		}
	}
}

If we run the code above, we might get similar result as the following:

from main: 0xc0000a4030

from flat: 0xc0000a4048
inside loop i-0: 0xc0000b0040
inside loop i-1: 0xc0000b0040

from nested: 0xc0000a4060
inside i-0 loop j-0: 0xc0000b0080
inside i-0 loop j-1: 0xc0000b0080
inside i-1 loop j-0: 0xc0000b00c0
inside i-1 loop j-1: 0xc0000b00c0

Such data duplication also arise even inside a loop and/or nested loop.

Proposal

To address the possible concerns mentioned above, this issue proposes to use proper pointer whenever possible instead of value. In order to apply this, we need to keep in mind considerations about the concerns previously mentioned.

Considertion: Side Effect

Using pointer has more probability to side effect. However, if the code is clean enough, such probability could be minimal to none.

For example, if we have a function:

func validateJob(*models.JobSpec) error {}

the caller can be at ease that this function won't do anything to cause side effect. But, there might be a question, what if that function indeed cause side effect? Then, the problem is in the function itself. Why would the function cause side effect? Does it reflect the name of the function? If the name of the function doesn't imply any side effect, then why would it need be?

It will be different if the function name is:

func enrichJob(*models.JobSpec) error {}

In the second function, we can be sure that there will be a modification to the job and the caller should be prepared for it.

In relation to the current implementation, rather than:

func (*dependencyResolver) Resolve(..., jobSpec models.JobSpec, ...) (models.JobSpec, error) {
    ...
}

it would be clearer and cleaner if we can change it to:

func (*dependencyResolver) Resolve(..., jobSpec *models.JobSpec, ...)  error {
    ...
    // do some side effect
}

or

func (*dependencyResolver) GetResolvedJob(..., jobSpec *models.JobSpec, ...) (*models.JobSpec, error) {
    ...
    // making sure no side effect exist, while returning a new job spec
}

Consideration: Memory Consumption

When we use pointer to pass around data, we can be sure that what we pass is not a copy of what we sent, but instead an address to that data. So, even if there are a lot of asynchronous operations, what we actually sent around is the memory address, not a copy. With this, we can minimize the memory consumption.

irainia avatar Jun 20 '22 06:06 irainia

As part of this issue we can focus on using JobSpec by pointer.

sbchaos avatar Jun 23 '22 10:06 sbchaos