Wu Yi
Wu Yi
https://github.com/PaddlePaddle/Paddle/issues/4294 这里有一个CPU的多进程的例子。我不太清楚flask创建进程/线程的时机,合适的流程应该是,先fork一堆worker进程,然后每个进程独立import paddle 并init,然后初始化和load参数,再监听request并处理,每个进程有独自的paddle instance。能确保程序按照这个流程应该可以
我没做相关实验,可能共享参数的方法不行,但不共享参数的多进程模型肯定有方法能做到。
If the current Dockerfile got many steps that will perform exactly the same on every build, caching will save a lot of time, yet adding `--cache-from` will need to pull...
Maybe this is also needed for `paddlectl file get`
Or, we can put all component in one Docker image and run several pods using the same image.
The go source file directory containing a go command line client for users to communicate to paddlecloud server. The python code contains paddlecloud server side which parses user's command line...
We'll remove the `paddlecloud` directory once the new client under paddlectl is ready.
I think the only version require is that Kubernetes 1.8 no longer supports TPR. While in this document, we only describe the deployment of python server and pfs server, can...
> If pserver can't save checkpoint, the job can't be fault-tolerant, should the whole job exit? We don't consider pserver fault tolerant currently. It's not a blocking issue, yet we...
Plus disadvantage: `kubectl` exposed too much details of kubernetes that users may never use.