tofu-controller icon indicating copy to clipboard operation
tofu-controller copied to clipboard

SIGSEGV: segmentation violation

Open caspervk opened this issue 2 years ago • 8 comments

2022-06-01T15:12:54.107621723Z panic: runtime error: invalid memory address or nil pointer dereference                                                                                                                                                                                                                      
2022-06-01T15:12:54.107661697Z [signal SIGSEGV: segmentation violation code=0x1 addr=0x18 pc=0x15c8039]                                                                                                                                                                                                                     
2022-06-01T15:12:54.107666863Z                                                                                                                                                                                                                                                                                              
2022-06-01T15:12:54.107671687Z goroutine 232 [running]:                                                                                                                                                                                                                                                                     
2022-06-01T15:12:54.114675389Z github.com/weaveworks/tf-controller/controllers.(*TerraformReconciler).finalize(_, {_, _}, {{{0x1641244, 0x9}, {0xc000485ba0, 0x20}}, {{0xc000570660, 0x1c}, {0x0, ...}, ...}, ...}, ...)                                                                                                    
2022-06-01T15:12:54.114703099Z     /workspace/controllers/terraform_controller.go:1455 +0x199                                                                                                                                                                                                                               
2022-06-01T15:12:54.114709530Z github.com/weaveworks/tf-controller/controllers.(*TerraformReconciler).Reconcile(0xc000528000, {0x1c113a8, 0xc00063f140}, {{{0xc000641d58, 0x18d25c0}, {0xc000570660, 0x30}}})                                                                                                               
2022-06-01T15:12:54.114714035Z     /workspace/controllers/terraform_controller.go:204 +0x7e6                                                                                                                                                                                                                                
2022-06-01T15:12:54.115162811Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc0001dc000, {0x1c113a8, 0xc00063f0b0}, {{{0xc000641d58, 0x18d25c0}, {0xc000570660, 0x413c54}}})                                                                                                             
2022-06-01T15:12:54.115181858Z     /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:114 +0x26f                                                                                                                                                                                      
2022-06-01T15:12:54.115187264Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0001dc000, {0x1c11300, 0xc000304200}, {0x17f2020, 0xc0002a8080})                                                                                                                                    
2022-06-01T15:12:54.115191724Z     /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:311 +0x33e                                                                                                                                                                                      
2022-06-01T15:12:54.115206882Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0001dc000, {0x1c11300, 0xc000304200})                                                                                                                                                            
2022-06-01T15:12:54.115212605Z     /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:266 +0x205                                                                                                                                                                                      
2022-06-01T15:12:54.115218826Z sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2()                                                                                                                                                                                                         
2022-06-01T15:12:54.115225109Z     /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:227 +0x85                                                                                                                                                                                       
2022-06-01T15:12:54.115230928Z created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2                                                                                                                                                                                                  
2022-06-01T15:12:54.115236355Z     /go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:223 +0x357                                                                                                                                                                                      
2022-06-01T15:12:55.084459176Z Stream closed EOF for flux-system/tf-controller-6d7b4d7c55-cfrgz (tf-controller)                                                                                                                                                                                                             

Should be latest:

Containers:                                                                                                                                                                                                                                                                                                                 
  tf-controller:                                                                                                                                                                                                                                                                                                            
    Image:         ghcr.io/weaveworks/tf-controller:v0.9.5                                                                                                                                                                                                                                                                  
    Image ID:      ghcr.io/weaveworks/tf-controller@sha256:64c4683034ead58d3cd592d11011ff103cd95356e1952c6841233b57886c418e                                                                                                                                                                                                 

caspervk avatar Jun 01 '22 15:06 caspervk

It seems your GitRepository object got deleted before the finalization process kicked in, is that correct? This is an interesting behavior that needs to think carefully about how to cope with it.

Thank you so much @caspervk

chanwit avatar Jun 01 '22 15:06 chanwit

Yes, I am deploying a GitRepository (and tf-runner RoleBinding/ServiceAccount) as part of a helm chart. This issue occurs on helm uninstall. Relatedly, I am also having issues with helm uninstalling the RoleBinding and ServiceAccount in my namespace before the tf-controller can start a pod to destroy the Terraform resources, causing the Terraform resources to never be deleted. I'm not sure this is an issue with the tf-controller exactly, as the problem would be solved by setting a custom (un)install order for the helm chart, but this is unfortunately not possible. For now I am solving the issue by deploying the RBAC resources manually (well, through flux, of course) instead of as part of my application's chart.

caspervk avatar Jun 01 '22 15:06 caspervk

In my particular case, the problem with the GitRepository would probably be solved by a solution to our discussion https://github.com/weaveworks/tf-controller/discussions/238, as I would no longer need to deploy a GitRepository.

caspervk avatar Jun 01 '22 15:06 caspervk

Actually, @chanwit, a good solution would maybe be to add some kind of finalizer to the GitRepository (and perhaps the ServiceAccount/RoleBinding)? I envision something not unlike PersistentVolume, from the docs:

A common example of a finalizer is kubernetes.io/pv-protection, which prevents accidental deletion of PersistentVolume objects. When a PersistentVolume object is in use by a Pod, Kubernetes adds the pv-protection finalizer. If you try to delete the PersistentVolume, it enters a Terminating status, but the controller can't delete it because the finalizer exists. When the Pod stops using the PersistentVolume, Kubernetes clears the pv-protection finalizer, and the controller deletes the volume.

Something similar seems reasonable for the relationship between the Terraform resources and the GitRepositorys they use.

caspervk avatar Jun 01 '22 16:06 caspervk

A GitRepository is often shared among TF objects, Kustomization objects etc.

Unfortunately we cannot delete it in the finalizer.

chanwit avatar Jun 01 '22 16:06 chanwit

I'm honestly not too familiar with Kubernetes finalizers, but does a finalizer necessarily have to delete an object, as much as it is a blocker for deletion? My model is that each Terraform resource would define a finalizer on the GitRepository that they use, thereby blocking its deletion. I might very well misunderstand.

caspervk avatar Jun 01 '22 16:06 caspervk

After consulting the Flux team, we come up with a mechanism similar to what Kustomization Controller is using to deal with this issue.

I'll elaborate more in a PR.

chanwit avatar Jun 01 '22 17:06 chanwit

Looking forward to it!

caspervk avatar Jun 01 '22 19:06 caspervk