Shreya Rajpal

Results 15 comments of Shreya Rajpal

@jiangweiatgithub thanks for sharing the yaml file. We're currently looking into this issue and trying to reproduce it. Are you also able to share `alt_en-US_zh-CN_200k.tsv`?

@arnavgarg1 this is an interesting find and seems like it might be an issue with tune. It is `ray.tune()` that maintains the state of overall trials and selects best trial...

The issue @arnavgarg1 uncovered was fixed in https://github.com/ray-project/ray/pull/26943. Next steps: Run a long running hyperopt experiment on a large dataset, and try reproing the issue.

Good questions! The issue we're seeing is that with the current window size, we shouldn't be experiencing the memory pressure that we currently are. So the goal for solving this...

Recommendation from AnyScale team: Try out [push based shuffle](https://docs.ray.io/en/master/data/performance-tips.html#improving-shuffle-performance).

Hi @Jeffwan, we've now merged #2115 that enables syncing checkpoints between k8s nodes for a Kubernetes deployment.

@Jeffwan thanks for filing this issue. A-priori, it seems like there may not be enough CPU resources to perform the map batches call. I'm working on reproing this issue however...

@Jeffwan after looking into this more, I can confirm that this issue is caused by lack of enough CPU resources. In your current setup, all CPU cores are reserved by...

Hi @Jeffwan, great question! The most straightforward way to get hyperopt trial status would be the `experiment_state-.json` file that is created inside the results logdir. This file contains a list...

@hwchase17 rebasing on the latest master fixed the checks! Lmk if you need me to help out with the imports