KubeFATE
KubeFATE copied to clipboard
ModuleNotFoundError: No module named 'federatedml' with docker-deploy
I deploy FATE following 使用Docker Compose 部署 FATE
After deployment, I use the following command to enter the client container:
docker exec -it confs-10000_client_1 bash
But when executing ./examples/benchmark_quality/homo_nn/fate-homo_nn.py the following error was reported:
Traceback (most recent call last):
File "./fate-homo_nn.py", line 25, in <module>
from federatedml.evaluation.metrics import classification_metric
ModuleNotFoundError: No module named 'federatedml'
How do I import the federatedml
package in the client container?
Besides, I am a beginner and not familiar with the FATE framework. I would like to know how to use Python or jupyter to develop federated learning code in the case of docker deployment (for example, run the Resnet-example or building a custom dataset) instead of using the flow
command.
Thanks!
environment needs to be imported before use:
source /data/projects/fate/bin/init_env.sh
root@bf1b603f8015:/data/projects/fate# cd bin
bash: cd: bin: No such file or directory
My FATE version is v1.10.0
It seems that there is no such script.
I did a test and got the same error. This should be a bug in the client image. The client image does not fully test the examples. Dependent packages such as federatedml and fate_test are not included.
How can I install these two packages manually?
You can try to set pythonpath and run it:
export PYTHONPATH=/data/projects/fate/fate/python
root@ff9d37a0afb0:/data/projects/fate# cd /data/projects/fate/fate/python
bash: cd: /data/projects/fate/fate/python: No such file or directory
It seems that in the client container, the federatedml & python related folders are missing.
I did a test and got the same error. This should be a bug in the client image. The client image does not fully test the examples. Dependent packages such as federatedml and fate_test are not included.
@owlet42
Sorry to bother you again, is there a way for me to manually install federatedml
? I hope to continue my studies.
Thanks.
I did a test and got the same error. This should be a bug in the client image. The client image does not fully test the examples. Dependent packages such as federatedml and fate_test are not included.
@owlet42
Sorry to bother you again, is there a way for me to manually install
federatedml
? I hope to continue my studies.Thanks.
@0kuang
A simple way is to add a volume mount for federatedml, and add the federatedml path to the PYTHONPATH environment variable.

After I tried it, I found that there are other dependencies that need to be resolved.
I solved the dependency problem as you said:
- set the PYTHONPATH
- Clone the code of the missing package in the github repo
- Copy a service_conf.yaml
Now I have a new problem, a new error occurs when executing pipeline.fit():
ValueError: job submit failed, err msg: {'jobId': '202303062227458326320', 'retcode': 103, 'retmsg': 'Traceback (most recent call last):
File "/data/projects/fate/fateflow/python/fate_flow/scheduler/dag_scheduler.py", line 142, in submit
raise Exception("create job failed", response)
Exception: (\'create job failed\', {\'guest\': {9999: {\'retcode\': <RetCode.FEDERATED_ERROR: 104>, \'retmsg\': \'Federated schedule error, <_InactiveRpcError of RPC that terminated with:\
\\tstatus = StatusCode.UNKNOWN\
\\tdetails = "\
[Roll Site Error TransInfo] \
location msg=java.lang.String cannot be cast to java.lang.Integer \
stack info=java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer\
\\tat scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101)\
\\tat com.webank.eggroll.rollsite.Router$.query(Router.scala:80)\
\\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:80)\
\\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\
\\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\
\\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\
\\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\
\\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\
\\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\
\\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\
\\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\
\\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\
\\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\
\\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\
\\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\
\\tat java.lang.Thread.run(Thread.java:750)\
\
\
exception trans path: rollsite(10000)"\
\\tdebug_error_string = "{"created":"@1678112871.934791845","description":"Error received from peer ipv4:192.167.0.5:9370","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"\\\
[Roll Site Error TransInfo] \\\
location msg=java.lang.String cannot be cast to java.lang.Integer \\\
stack info=java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer\\\
\\\\tat scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:101)\\\
\\\\tat com.webank.eggroll.rollsite.Router$.query(Router.scala:80)\\\
\\\\tat com.webank.eggroll.rollsite.EggSiteServicer.unaryCall(EggSiteServicer.scala:80)\\\
\\\\tat com.webank.ai.eggroll.api.networking.proxy.DataTransferServiceGrpc$MethodHandlers.invoke(DataTransferServiceGrpc.java:406)\\\
\\\\tat io.grpc.stub.ServerCalls$UnaryServerCallHandler$UnaryServerCallListener.onHalfClose(ServerCalls.java:180)\\\
\\\\tat io.grpc.PartialForwardingServerCallListener.onHalfClose(PartialForwardingServerCallListener.java:35)\\\
\\\\tat io.grpc.ForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:23)\\\
\\\\tat io.grpc.ForwardingServerCallListener$SimpleForwardingServerCallListener.onHalfClose(ForwardingServerCallListener.java:40)\\\
\\\\tat io.grpc.Contexts$ContextualizedServerCallListener.onHalfClose(Contexts.java:86)\\\
\\\\tat io.grpc.internal.ServerCallImpl$ServerStreamListenerImpl.halfClosed(ServerCallImpl.java:331)\\\
\\\\tat io.grpc.internal.ServerImpl$JumpToApplicationThreadServerStreamListener$1HalfClosed.runInContext(ServerImpl.java:814)\\\
\\\\tat io.grpc.internal.ContextRunnable.run(ContextRunnable.java:37)\\\
\\\\tat io.grpc.internal.SerializingExecutor.run(SerializingExecutor.java:123)\\\
\\\\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)\\\
\\\\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)\\\
\\\\tat java.lang.Thread.run(Thread.java:750)\\\
\\\
\\\
exception trans path: rollsite(10000)","grpc_status":2}"\
>\'}}, \'host\': {10000: {\'data\': {\'components\': {\'eval_0\': {\'need_run\': True}, \'nn_0\': {\'need_run\': True}, \'reader_0\': {\'need_run\': True}, \'reader_1\': {\'need_run\': True}}}, \'retcode\': 0, \'retmsg\': \'success\'}}, \'arbiter\': {10000: {\'data\': {\'components\': {\'eval_0\': {\'need_run\': True}, \'nn_0\': {\'need_run\': True}, \'reader_0\': {\'need_run\': False}, \'reader_1\': {\'need_run\': False}}}, \'retcode\': 0, \'retmsg\': \'success\'}}})
'}
I think the key lies in the rollsite, I don't know if it is helpful for you to judge.
# key
exception trans path: rollsite(10000)"\
\\tdebug_error_string = "{"created":"@1678112871.934791845","description":"Error received from peer ipv4:192.167.0.5:9370","file":"src/core/lib/surface/call.cc","file_line":952,"grpc_message":"\\\
[Roll Site Error TransInfo] \\\
location msg=java.lang.String cannot be cast to java.lang.Integer \\\
stack info=java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer\\\
thank you for your reply~
Please make sure that all components of your FATE are working properly and can complete unilateral and multilateral toy tests.
flow test toy -gid 9999 -hid 9999 # unilateral
flow test toy -gid 9999 -hid 10000 # multilateral
I can now run the example code for Resnet with homo-nn correctly.
I would like to ask how to use GPU to accelerate training in FATE deployed by docker. Do you have any recommended tutorials?
In addition, which container will the task submitted through jupyter
on confs_10000_client-1
eventually run on?
Thanks for your answer.
Currently does not support the deployment of GPU, the FATE task is mainly run in fateflow, the detailed process can refer to here https://federatedai.github.io/FATE-Flow/latest/fate_flow/
Which deployment method supports GPU?
The FedAvgTrainer
in the FATE framework supports cuda=True
. Is this parameter useful?
FedAvgTrainer
has this configuration, and you can try setting cuda=True
to use GPU.