Tianyi Chen

Results 9 issues of Tianyi Chen

### Description #### Source code compilation and installation - [x] streaming c++ - [x] streaming python - [x] streaming java - [x] mobius python - [ ] mobius java ####...

meta

Fix java test case for native issues.

### What changes were proposed in this pull request? 1. Implement ```CheckTrainingHangOperator``` based on XPU Timer metric. 2. Integrate context from ```JobManager``` in ```DiagnosisDataManager```. 3. Use limited ```Deque``` instead of...

enhancement

### What changes were proposed in this pull request? 1. Add 'succeeded report' implement. 2. Add 'succeeded' flag for ```Node``` object. 3. Skip 'succeeded node' in 'noheartbeat' judgement. ### Why...

enhancement

# Background Currently, DLRover uses the official Kubernetes Python client to interact with the Kubernetes API Server. This part of implementations are quite important because it involves managing the lifecycle...

Hacktoberfest
wip

### What changes were proposed in this pull request? 1. A POC framework definition, along with a basic RL solution implementation. This includes the entire process from RL job submission...

documentation
example
feature

**Is your feature request related to a problem? Please describe.** It is now recommended to directly use the checkpoint implementation from the latest version of Megatron. The dlrover's integration with...

enhancement

**Is your feature request related to a problem? Please describe.** Considering the use of gRPC involves significant dependency issues, and there is a degree of uncertainty when using gRPC in...

enhancement
wip

### What changes were proposed in this pull request? 1. Complete the documentation for the new architecture. 2. Update the homepage. 3. Update release-related information. ### Why are the changes...

documentation
wip