casbin icon indicating copy to clipboard operation
casbin copied to clipboard

[Question] concurrent race when LoadPolicy is called

Open sasakiyori opened this issue 2 years ago • 7 comments

Want to prioritize this issue? Try:

issuehunt-to-marktext


What's your scenario? What do you want to achieve?

Sometimes when a watcher calls LoadPolicy to synchronize policies, panic occurs:

fatal error: concurrent map iteration and map write goroutine 152 [running]: runtime.throw({0x2557e5c?, 0x3711778?}) /usr/local/Cellar/go/1.18.3/libexec/src/runtime/panic.go:992 +0x71 fp=0xc000c9b818 sp=0xc000c9b7e8 pc=0x103e7d1 runtime.mapiternext(0xc000307dd0?) /usr/local/Cellar/go/1.18.3/libexec/src/runtime/map.go:871 +0x60a fp=0xc000c9b8a8 sp=0xc000c9b818 pc=0x1012eea runtime.mapiterinit(0x234dae0?, 0xc00064d110, 0xc000c9ba78) /usr/local/Cellar/go/1.18.3/libexec/src/runtime/map.go:861 +0x18c fp=0xc000c9b8c8 sp=0xc000c9b8a8 pc=0x101288c github.com/casbin/casbin/v2/model.(*Assertion).copy(0xc0008fe880) /Users/sasakiyori/go/pkg/mod/github.com/casbin/casbin/[email protected]/model/assertion.go:108 +0x2cf fp=0xc000c9bae8 sp=0xc000c9b8c8 pc=0x199c00f github.com/casbin/casbin/v2/model.Model.Copy(0xc0006dd800) /Users/sasakiyori/go/pkg/mod/github.com/casbin/casbin/[email protected]/model/model.go:384 +0x1b0 fp=0xc000c9bc58 sp=0xc000c9bae8 pc=0x19a0cd0 github.com/casbin/casbin/v2.(*Enforcer).LoadPolicy(0xc000684000) /Users/sasakiyori/go/pkg/mod/github.com/casbin/casbin/[email protected]/enforcer.go:291 +0x5a fp=0xc000c9bdc8 sp=0xc000c9bc58 pc=0x19b76ba github.com/casbin/casbin/v2.(*Enforcer).SetWatcher.func1({0x1c29, 0xa}) /Users/sasakiyori/go/pkg/mod/github.com/casbin/casbin/[email protected]/enforcer.go:260 +0x25 fp=0xc000c9bde8 sp=0xc000c9bdc8 pc=0x19b7645 github.com/casbin/etcd-watcher/v2.(*Watcher).startWatch(0xc000349c00) /Users/sasakiyori/go/pkg/mod/github.com/casbin/etcd-watcher/[email protected]/watcher.go:127 +0x268 fp=0xc000c9bfc0 sp=0xc000c9bde8 pc=0x2289c28 github.com/casbin/etcd-watcher/v2.NewWatcher.func1() /Users/sasakiyori/go/pkg/mod/github.com/casbin/etcd-watcher/[email protected]/watcher.go:67 +0x25 fp=0xc000c9bfe0 sp=0xc000c9bfc0 pc=0x22892e5 runtime.goexit() /usr/local/Cellar/go/1.18.3/libexec/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc000c9bfe8 sp=0xc000c9bfe0 pc=0x1072e01 created by github.com/casbin/etcd-watcher/v2.NewWatcher /Users/sasakiyori/go/pkg/mod/github.com/casbin/etcd-watcher/[email protected]/watcher.go:66 +0x2a8


I can't reproduce it every time, but I think the problem may arised here:

As the call chain shown below, when a goroutine do iteration in ast.PolicyMap from ast.copy(), any other goroutines write into this map(such as AddPolicy) may cause panic.

func (e *Enforcer) LoadPolicy() error {
	needToRebuild := false
        // panic occurs
	newModel := e.model.Copy()
	newModel.ClearPolicy()
        // ...
}

func (model Model) Copy() Model {
	newModel := NewModel()

	for sec, m := range model {
		newAstMap := make(AssertionMap)
                 // ast is a pointer of AssertionMap
		for ptype, ast := range m {
			newAstMap[ptype] = ast.copy()
		}
		newModel[sec] = newAstMap
	}

	newModel.SetLogger(model.GetLogger())
	return newModel
}

func (ast *Assertion) copy() *Assertion {
	tokens := append([]string(nil), ast.Tokens...)
	policy := make([][]string, len(ast.Policy))

	for i, p := range ast.Policy {
		policy[i] = append(policy[i], p...)
	}
	policyMap := make(map[string]int)
        // map is unsafe between goroutines
        // when it is iterating, other goroutine's write operation may cause panic
	for k, v := range ast.PolicyMap {
		policyMap[k] = v
	}

	newAst := &Assertion{
		Key:           ast.Key,
		Value:         ast.Value,
		PolicyMap:     policyMap,
		Tokens:        tokens,
		Policy:        policy,
		FieldIndexMap: ast.FieldIndexMap,
	}

	return newAst
}

And What confused me is: After copy a new model, it will clear policy map, so the copy of PolicyMap from ast.copy() is useless. Could you create a new copy function to eliminate these unsafe operations?

func (e *Enforcer) LoadPolicy() error {
	needToRebuild := false
	newModel := e.model.Copy()
	newModel.ClearPolicy()
        // ...
}

func (model Model) ClearPolicy() {
	for _, ast := range model["p"] {
		ast.Policy = nil
		ast.PolicyMap = map[string]int{}
	}

	for _, ast := range model["g"] {
		ast.Policy = nil
		ast.PolicyMap = map[string]int{}
	}
}

sasakiyori avatar Oct 11 '22 12:10 sasakiyori

@tangyang9464 @JalinWang

casbin-bot avatar Oct 11 '22 12:10 casbin-bot

@maocaicai

hsluoyz avatar Nov 21 '22 08:11 hsluoyz

Sorry for updating this issue so late.

In this issue, I use SyncedEnforcer to manage RBAC model. Most of functions in SyncedEnforcer have no goroutine race problems even LoadPolicy do, because all of them are controlled by mutex.

func (e *SyncedEnforcer) LoadPolicy() error {
	e.m.Lock()
	defer e.m.Unlock()
	return e.Enforcer.LoadPolicy()
}

But the concurrent race mentioned above, is caused by Enforcer.LoadPolicy but not SyncedEnforcer.LoadPolicy, because it is called by Enforcer.Watcher.

func (e *SyncedEnforcer) SetWatcher(watcher persist.Watcher) error {
	e.m.Lock()
	defer e.m.Unlock()
	return e.Enforcer.SetWatcher(watcher)
}

func (e *Enforcer) SetWatcher(watcher persist.Watcher) error {
	e.watcher = watcher
	if _, ok := e.watcher.(persist.WatcherEx); ok {
		return nil
	} else {
                // LoadPolicy here is goroutine unsafe.
		return watcher.SetUpdateCallback(func(string) { _ = e.LoadPolicy() })
	}
}

So I think there are some ways to solve this problem:

  1. I can call watcher.SetUpdateCallback locally to change the default callback to solve the concurrent race. It works around for me, but not for casbin.
  2. Revise the default callback for watcher (but it may be difficult, because Enforcer has no methods for sync):
    return watcher.SetUpdateCallback(func(string) { _ = e.LoadPolicy() })
    
  3. Revise SyncedEnforcer.SetWatcher:
    func (e *SyncedEnforcer) SetWatcher(watcher persist.Watcher) error {
        e.watcher = watcher
        if _, ok := e.watcher.(persist.WatcherEx); ok {
    	    return nil
        } else {
    	    return watcher.SetUpdateCallback(func(string) {_ = e.LoadPolicy()})
        }
    }
    
  4. Like I stated at first, revise the Model.Copy() or Assertion.copy(), ~~avoid the useless map iteration~~

Could you give me some advices? I'd like to work on this if possible.

sasakiyori avatar Nov 30 '22 10:11 sasakiyori

We can do like:

if typeof(enforcer) == SyncEnforcer {
  watcher.SetUpdateCallback(func(string) { _ = e.(SyncEnforcer).LoadPolicy() })
} else {
  watcher.SetUpdateCallback(func(string) { _ = e.LoadPolicy() })
}

hsluoyz avatar Dec 08 '22 12:12 hsluoyz

@sasakiyori can you do this?

hsluoyz avatar Dec 08 '22 12:12 hsluoyz

We can do like:

if typeof(enforcer) == SyncEnforcer {
  watcher.SetUpdateCallback(func(string) { _ = e.(SyncEnforcer).LoadPolicy() })
} else {
  watcher.SetUpdateCallback(func(string) { _ = e.LoadPolicy() })
}

@hsluoyz

Actually Enforcer is not interface but struct, and it is embedded by SyncedEnforcer. So we cannot cast Enforcer to SyncEnforcer .


In my codes, I work arounds like this:

syncedEnforcer, err := casbin.NewSyncedEnforcer(params...)
if err != nil {
    return err
}
watcher, err := etcdwatcher.NewWatcher(endpoints, keyName)
if err != nil {
    return err
}
if err = syncedEnforcer.SetWatcher(watcher); err != nil {
    return err
}

err = watcher.SetUpdateCallback(
    func(string) {
         // use the goroutine safe function instead
         _ = syncedEnforcer.LoadPolicy()
    }
)
if err != nil {
    return err
}

According to the method above, I think the easiest way to fix this issue in casbin may like the code below. But it is graceless...

// func (e *SyncedEnforcer) SetWatcher(watcher persist.Watcher) error {
// 	e.m.Lock()
// 	defer e.m.Unlock()
// 	return e.Enforcer.SetWatcher(watcher)
// }

// Rewrite SetWatcher function
func (e *SyncedEnforcer) SetWatcher(watcher persist.Watcher) error {
	e.m.Lock()
	defer e.m.Unlock()
	e.watcher = watcher
	if _, ok := e.watcher.(persist.WatcherEx); ok {
		return nil
	} else {
		return watcher.SetUpdateCallback(func(string) { _ = e.LoadPolicy() })
	}
}

sasakiyori avatar Dec 09 '22 09:12 sasakiyori

This looks like a serious and hidden bug that should be addressed. The last proposed solution is not the most beautiful but solves the problem. Otherwise the issue should be clearly stated as it's quite dangerous!

peterdeka avatar Oct 19 '23 14:10 peterdeka