bk-iam-saas icon indicating copy to clipboard operation
bk-iam-saas copied to clipboard

[Backend] 后台redis缓存重新review

Open zhu327 opened this issue 2 years ago • 6 comments

  1. 重新review下现在生产环境版本的缓存配置, 出问题的地方在获取subject - group的关系这个点, auth/query接口
  2. 重新review下新版本的鉴权链路, redis缓存的问题

考虑redis作为备份存储, db挂了的情况下还能扛一定的时间

zhu327 avatar Oct 20 '22 11:10 zhu327

TODO: 将redis升级为另一套存储, 确保数据一致性

  1. redis cache能否都改长
  2. 不用defer, 操作失败不清缓存
  3. 删失败了, 需要有补偿机制
  4. 都在白天操作, 过期时间 假设是 7 天, TTL 7天+12 小时; 把白天的操作缓存过期时间延迟到晚上失效
  5. 删失败, 加retry, retry 失败, 通过队列等机制延迟删除

wklken avatar Oct 20 '22 12:10 wklken

问题: 目前并没有做到redis挂了不影响服务

system error[request_id=eecd595e3dd243eba0613dd2503a99a2]: [Handler:Query] systemID=`demo`, body=`{baseRequest:{System:demo Subject:{Type:user ID:user005}} Resources:[] Action:{ID:access_developer_center}}` 
� [PDP:Query] queryAndPartialEvalConditions fail%!(EXTRA types.Action={access_developer_center 0xc0007140e0}) 
� [PDP:queryAndPartialEvalConditions] GetEffectAuthTypeGroupPKs systemID=`demo`, subject=`{Type:user ID:user005 Attribute:0xc0007140d8}`, action=`{ID:access_developer_center Attribute:0xc0007140e0}` fail 
� [PRP:getEffectSubjectPKs] ListSubjectEffectGroups deptPKs=`[]` fail 
� [Cache:ListSystemSubjectEffectGroups] batchGetSystemSubjectGroups systemID=`demo`, pks=`[5]` fail 
� [Cache:batchGetSystemSubjectGroups] SubjectGroupCache.BatchGet keys=`[{SystemID:demo SubjectPK:5}]` fail 
� [Raw:Error] EOF
image

这里报错, 应该fallback到 db 查询

wklken avatar Oct 21 '22 02:10 wklken

image

假设服务能在redis挂了的情况下正常运行, 那么不应该拉不起来(需要保证鉴权服务正常)

wklken avatar Oct 21 '22 03:10 wklken

system error[request_id=7a771361630c49f4be34756063757631]: [Handler:Auth] systemID=`demo`, body=`{baseRequest:{System:demo Subject:{Type:user ID:user105}} Resources:[{System:demo Type:app ID:002 Attribute:map[]}] Action:{ID:view_app}}` 
  [PDP:Eval] GetEffectAuthTypeGroupPKs systemID=`demo`, subject=`{Type:user ID:user105 Attribute:0xc00052eb60}`, action=`{ID:view_app Attribute:0xc00052eb68}` fail 
  [GroupRedisLayer:Retrieve] batchGetGroupAuthType fail groupPKs=`[2105]` 
  [Raw:Error] dial tcp 127.0.0.1:6379: connect: connection refused

image
system error[request_id=21ca36ecce4c4f56826b83fec95d21e9]: [Handler:Auth] systemID=`demo`, body=`{baseRequest:{System:demo Subject:{Type:user ID:user105}} Resources:[{System:demo Type:app ID:002 Attribute:map[]}] Action:{ID:view_app}}` 
  [PDP:Eval] GetEffectAuthTypeGroupPKs systemID=`demo`, subject=`{Type:user ID:user105 Attribute:0xc0000ca908}`, action=`{ID:view_app Attribute:0xc0000ca910}` fail 
  [GroupRedisLayer:Retrieve] batchSetGroupAuthTypeCache fail missGroupAuthTypes=`[{GroupPK:2105 AuthType:2}]` 
  [Raw:Error] EOF

system error[request_id=039dc2fb87e64746bc44a20e19dcc496]: [Handler:Auth] systemID=`demo`, body=`{baseRequest:{System:demo Subject:{Type:user ID:user105}} Resources:[{System:demo Type:app ID:002 Attribute:map[]}] Action:{ID:view_app}}` 
  [PDP:Eval] rbacEval systemID=`demo`, actionID=`%!d(string=view_app)`, resources=`[{System:demo Type:app ID:002 Attribute:map[]}]`, groupPKs=`[2105]` fail 
  [PDP:rbacEval] GetResourceActionAuthorizedGroupPKs fail, system=`demo` action=`{ID:view_app Attribute:0xc00059e268}` resource=`{System:demo Type:app ID:002 TypePK:1}` 
  [Raw:Error] EOF

wklken avatar Oct 21 '22 03:10 wklken

缓存删除失败怎么办? 是否有机制能保证数据一致性

?

wklken avatar Oct 21 '22 04:10 wklken

先解决第一个问题:

  1. redis跪了, 可以fallback到mysql正常服务

zhu327 avatar Dec 02 '22 06:12 zhu327