bk-iam-saas
bk-iam-saas copied to clipboard
[Backend] 后台redis缓存重新review
- 重新review下现在生产环境版本的缓存配置, 出问题的地方在获取subject - group的关系这个点, auth/query接口
- 重新review下新版本的鉴权链路, redis缓存的问题
考虑redis作为备份存储, db挂了的情况下还能扛一定的时间
TODO: 将redis升级为另一套存储, 确保数据一致性
- redis cache能否都改长
- 不用defer, 操作失败不清缓存
- 删失败了, 需要有补偿机制
- 都在白天操作, 过期时间 假设是 7 天, TTL 7天+12 小时; 把白天的操作缓存过期时间延迟到晚上失效
- 删失败, 加retry, retry 失败, 通过队列等机制延迟删除
问题: 目前并没有做到redis挂了不影响服务
system error[request_id=eecd595e3dd243eba0613dd2503a99a2]: [Handler:Query] systemID=`demo`, body=`{baseRequest:{System:demo Subject:{Type:user ID:user005}} Resources:[] Action:{ID:access_developer_center}}`
� [PDP:Query] queryAndPartialEvalConditions fail%!(EXTRA types.Action={access_developer_center 0xc0007140e0})
� [PDP:queryAndPartialEvalConditions] GetEffectAuthTypeGroupPKs systemID=`demo`, subject=`{Type:user ID:user005 Attribute:0xc0007140d8}`, action=`{ID:access_developer_center Attribute:0xc0007140e0}` fail
� [PRP:getEffectSubjectPKs] ListSubjectEffectGroups deptPKs=`[]` fail
� [Cache:ListSystemSubjectEffectGroups] batchGetSystemSubjectGroups systemID=`demo`, pks=`[5]` fail
� [Cache:batchGetSystemSubjectGroups] SubjectGroupCache.BatchGet keys=`[{SystemID:demo SubjectPK:5}]` fail
� [Raw:Error] EOF
data:image/s3,"s3://crabby-images/d0557/d05575e0a0a84ebf62ed36e0911eb376fa5b234f" alt="image"
这里报错, 应该fallback到 db 查询
data:image/s3,"s3://crabby-images/d6d6c/d6d6c7f492474326323f39d5dd62a84a1639a555" alt="image"
假设服务能在redis挂了的情况下正常运行, 那么不应该拉不起来(需要保证鉴权服务正常)
system error[request_id=7a771361630c49f4be34756063757631]: [Handler:Auth] systemID=`demo`, body=`{baseRequest:{System:demo Subject:{Type:user ID:user105}} Resources:[{System:demo Type:app ID:002 Attribute:map[]}] Action:{ID:view_app}}`
[PDP:Eval] GetEffectAuthTypeGroupPKs systemID=`demo`, subject=`{Type:user ID:user105 Attribute:0xc00052eb60}`, action=`{ID:view_app Attribute:0xc00052eb68}` fail
[GroupRedisLayer:Retrieve] batchGetGroupAuthType fail groupPKs=`[2105]`
[Raw:Error] dial tcp 127.0.0.1:6379: connect: connection refused
data:image/s3,"s3://crabby-images/b119b/b119bfd19551fb9c9f5174a03f66be18a0132bca" alt="image"
system error[request_id=21ca36ecce4c4f56826b83fec95d21e9]: [Handler:Auth] systemID=`demo`, body=`{baseRequest:{System:demo Subject:{Type:user ID:user105}} Resources:[{System:demo Type:app ID:002 Attribute:map[]}] Action:{ID:view_app}}`
[PDP:Eval] GetEffectAuthTypeGroupPKs systemID=`demo`, subject=`{Type:user ID:user105 Attribute:0xc0000ca908}`, action=`{ID:view_app Attribute:0xc0000ca910}` fail
[GroupRedisLayer:Retrieve] batchSetGroupAuthTypeCache fail missGroupAuthTypes=`[{GroupPK:2105 AuthType:2}]`
[Raw:Error] EOF
system error[request_id=039dc2fb87e64746bc44a20e19dcc496]: [Handler:Auth] systemID=`demo`, body=`{baseRequest:{System:demo Subject:{Type:user ID:user105}} Resources:[{System:demo Type:app ID:002 Attribute:map[]}] Action:{ID:view_app}}`
[PDP:Eval] rbacEval systemID=`demo`, actionID=`%!d(string=view_app)`, resources=`[{System:demo Type:app ID:002 Attribute:map[]}]`, groupPKs=`[2105]` fail
[PDP:rbacEval] GetResourceActionAuthorizedGroupPKs fail, system=`demo` action=`{ID:view_app Attribute:0xc00059e268}` resource=`{System:demo Type:app ID:002 TypePK:1}`
[Raw:Error] EOF
缓存删除失败怎么办? 是否有机制能保证数据一致性
?
先解决第一个问题:
- redis跪了, 可以fallback到mysql正常服务