elasticdl
elasticdl copied to clipboard
some refactor work of parameter server
Step 1:
- [ ] embedding table Python data structure
- [ ] tensor proto message and Python data structure
- [ ] a new PS, which binds to a k8s service
- [ ] parameter sharding hash function
Step 2:
- [ ] KVStore Python data structure
- [ ] define PS service interface in proto
- [ ] refine current OptimizerWrapper to support KVStore
Step 3:
- [ ] refine worker related code to cooperate with new PS design
- [ ] evaluation support
- [ ] checkpoint support
Step 4:
- [ ] extend one-PS node to multi-PS node
- [ ] PS replica support
Step 5:
- [ ] fault tolerate feature and test