tensorrt icon indicating copy to clipboard operation
tensorrt copied to clipboard

UnavailableError: Can't provision more than one single cluster at a time

Open leo-XUKANG opened this issue 5 years ago • 10 comments

my code:

FP32_SAVED_MODEL_DIR = SAVED_MODEL_DIR+"_TFTRT_FP32/1"
!rm -rf $FP32_SAVED_MODEL_DIR
#Now we create the TFTRT FP32 engine
trt.create_inference_graph(
    input_graph_def=None,
    outputs=None,
    max_batch_size=1,
    input_saved_model_dir=SAVED_MODEL_DIR,
    output_saved_model_dir=FP32_SAVED_MODEL_DIR,
    precision_mode="FP32")

benchmark_saved_model(FP32_SAVED_MODEL_DIR, BATCH_SIZE=1)

and i have set: import os os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" os.environ["CUDA_VISIBLE_DEVICES"]="0"

when i run ,i got an error: InvalidArgumentError: Failed to import metagraph, check error log for more info

and then i add a code: tf.keras.backend.set_learning_phase(0) the error is gone ,but one error rasie: UnavailableError: Can't provision more than one single cluster at a time

emmm....... i just use one GPU,which is RTX 2080ti

cuda: Cuda compilation tools, release 10.0, V10.0.130

SOMEONE HELP ME, PLEASE!

leo-XUKANG avatar Sep 29 '19 09:09 leo-XUKANG

@leo-XUKANG for the message InvalidArgumentError: Failed to import metagraph, check error log for more info could you share the error log?

aaroey avatar Oct 10 '19 19:10 aaroey

I'm facing the same issue, sample code is here: https://gist.github.com/zyenge/2595f3369e7e6128dcc79b1a30c3e3cd I tried both frozen model and SavedModel, neither works

zyenge avatar Jan 03 '20 23:01 zyenge

@pooyadavoodi have you encountered similar issue before? Also @bixia1

aaroey avatar Jan 04 '20 05:01 aaroey

Hey guys, is there any fix to this please?

SirPhemmiey avatar Jan 25 '20 13:01 SirPhemmiey

@sanjoy @bixia1 could you help to investigate this?

aaroey avatar Jan 27 '20 15:01 aaroey

I think the issue was the number of GPU memory fraction i allocated

SirPhemmiey avatar Jan 27 '20 18:01 SirPhemmiey

Any update on this?

BernardinD avatar Jun 17 '20 15:06 BernardinD

My issue was fixed by fixing the output node names. I mistakenly used the output tensor names of another graph. I'd double check and see if you still have issues when setting outputs to something besides None.

BernardinD avatar Jul 07 '20 16:07 BernardinD

For: Can't provision more than one single cluster at a time

I believe this is caused as the graph is preloaded and havent successfully convert. Therefore, when you use jupyter to rerun, the GPU mem is not released. You should check the graph again to verify whether the outputs are correct. Every time convert fails, restart the jupyter kernel.

dtlam26 avatar Jun 14 '21 07:06 dtlam26

For: Failed to import metagraph, check error log for more info If you use jupyter notebook, pls check the result print in the terminal console. There will be a hint which node you are typing incorrect name. I suggest you should check tensorboard for the whole graph to graph correct name for the outputs

dtlam26 avatar Jun 14 '21 07:06 dtlam26