Error during llm node initialization for models_path

Open devangvin opened this issue 1 year ago • 1 comments

Describe the bug A clear and concise description of what the bug is.

I have prepared a text-generation model using the file demos/common/export_models/export_model.py. The config file is:

{
    "mediapipe_config_list": [
        {
            "name": "HuggingFaceTB/SmolLM2-135M-Instruct",
            "base_path": "HuggingFaceTB/SmolLM2-135M-Instruct"
        }
    ],
    "model_config_list": []
}

When I run the inference server using the docker container:

sudo docker run \
        --rm  -d \
        -p 8085:8085  \
        -v $MODEL_DIR:/workspace:ro  \
        openvino/model_server:2024.5  \
        --rest_port 8085  \
        --rest_bind_address 0.0.0.0 \
        --config_path /workspace/config.json

The server starts but i also get an error:

[2024-12-13 09:28:58.129][1][serving][info][server.cpp:84] OpenVINO Model Server 2024.5.816f620b6
[2024-12-13 09:28:58.129][1][serving][info][server.cpp:85] OpenVINO backend 2024.5.0.17288.7975fa5da0c
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:86] CLI parameters passed to ovms server
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:103] config_path: /workspace/config.json
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:105] gRPC port: 9178
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:106] REST port: 8085
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:107] gRPC bind address: 0.0.0.0
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:108] REST bind address: 0.0.0.0
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:109] REST workers: 64
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:110] gRPC workers: 1
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:111] gRPC channel arguments: 
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:112] log level: DEBUG
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:113] log path: 
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:114] file system poll wait milliseconds: 1000
[2024-12-13 09:28:58.129][1][serving][debug][server.cpp:115] sequence cleaner poll wait minutes: 5
[2024-12-13 09:28:58.129][1][serving][info][pythoninterpretermodule.cpp:35] PythonInterpreterModule starting
[2024-12-13 09:28:58.248][1][serving][info][pythoninterpretermodule.cpp:46] PythonInterpreterModule started
[2024-12-13 09:28:58.250][1][modelmanager][debug][mediapipefactory.cpp:47] Registered Calculators: AddHeaderCalculator, AlignmentPointsRectsCalculator, AnnotationOverlayCalculator, AnomalyCalculator, AnomalySerializationCalculator, AssociationNormRectCalculator, BeginLoopDetectionCalculator, BeginLoopFloatCalculator, BeginLoopGpuBufferCalculator, BeginLoopImageCalculator, BeginLoopImageFrameCalculator, BeginLoopIntCalculator, BeginLoopMatrixCalculator, BeginLoopMatrixVectorCalculator, BeginLoopModelApiDetectionCalculator, BeginLoopNormalizedLandmarkListVectorCalculator, BeginLoopNormalizedRectCalculator, BeginLoopRectanglePredictionCalculator, BeginLoopTensorCalculator, BeginLoopUint64tCalculator, BoxDetectorCalculator, BoxTrackerCalculator, CallbackCalculator, CallbackPacketCalculator, CallbackWithHeaderCalculator, ClassificationCalculator, ClassificationListVectorHasMinSizeCalculator, ClassificationListVectorSizeCalculator, ClassificationSerializationCalculator, ClipDetectionVectorSizeCalculator, ClipNormalizedRectVectorSizeCalculator, ColorConvertCalculator, ConcatenateBoolVectorCalculator, ConcatenateClassificationListCalculator, ConcatenateClassificationListVectorCalculator, ConcatenateDetectionVectorCalculator, ConcatenateFloatVectorCalculator, ConcatenateImageVectorCalculator, ConcatenateInt32VectorCalculator, ConcatenateLandmarListVectorCalculator, ConcatenateLandmarkListCalculator, ConcatenateLandmarkListVectorCalculator, ConcatenateLandmarkVectorCalculator, ConcatenateNormalizedLandmarkListCalculator, ConcatenateNormalizedLandmarkListVectorCalculator, ConcatenateRenderDataVectorCalculator, ConcatenateStringVectorCalculator, ConcatenateTensorVectorCalculator, ConcatenateTfLiteTensorVectorCalculator, ConcatenateUInt64VectorCalculator, ConstantSidePacketCalculator, CountingSourceCalculator, CropCalculator, DefaultSidePacketCalculator, DequantizeByteArrayCalculator, DetectionCalculator, DetectionClassificationCombinerCalculator, DetectionClassificationResultCalculator, DetectionClassificationSerializationCalculator, DetectionExtractionCalculator, DetectionLabelIdToTextCalculator, DetectionLetterboxRemovalCalculator, DetectionProjectionCalculator, DetectionSegmentationCombinerCalculator, DetectionSegmentationResultCalculator, DetectionSegmentationSerializationCalculator, DetectionSerializationCalculator, DetectionsToRectsCalculator, DetectionsToRenderDataCalculator, EmbeddingsCalculator, EmptyLabelCalculator, EmptyLabelClassificationCalculator, EmptyLabelDetectionCalculator, EmptyLabelRotatedDetectionCalculator, EmptyLabelSegmentationCalculator, EndLoopAffineMatrixCalculator, EndLoopBooleanCalculator, EndLoopClassificationListCalculator, EndLoopDetectionCalculator, EndLoopFloatCalculator, EndLoopGpuBufferCalculator, EndLoopImageCalculator, EndLoopImageFrameCalculator, EndLoopLandmarkListVectorCalculator, EndLoopMatrixCalculator, EndLoopModelApiDetectionClassificationCalculator, EndLoopModelApiDetectionSegmentationCalculator, EndLoopNormalizedLandmarkListVectorCalculator, EndLoopNormalizedRectCalculator, EndLoopPolygonPredictionsCalculator, EndLoopRectanglePredictionsCalculator, EndLoopRenderDataCalculator, EndLoopTensorCalculator, EndLoopTfLiteTensorCalculator, FaceLandmarksToRenderDataCalculator, FeatureDetectorCalculator, FlowLimiterCalculator, FlowPackagerCalculator, FlowToImageCalculator, FromImageCalculator, GateCalculator, GetClassificationListVectorItemCalculator, GetDetectionVectorItemCalculator, GetLandmarkListVectorItemCalculator, GetNormalizedLandmarkListVectorItemCalculator, GetNormalizedRectVectorItemCalculator, GetRectVectorItemCalculator, GraphProfileCalculator, HandDetectionsFromPoseToRectsCalculator, HandLandmarksToRectCalculator, HttpLLMCalculator, HttpSerializationCalculator, ImageCloneCalculator, ImageCroppingCalculator, ImagePropertiesCalculator, ImageToTensorCalculator, ImageTransformationCalculator, ImmediateMuxCalculator, InferenceCalculatorCpu, InstanceSegmentationCalculator, InverseMatrixCalculator, IrisToRenderDataCalculator, KeypointDetectionCalculator, LandmarkLetterboxRemovalCalculator, LandmarkListVectorSizeCalculator, LandmarkProjectionCalculator, LandmarkVisibilityCalculator, LandmarksRefinementCalculator, LandmarksSmoothingCalculator, LandmarksToDetectionCalculator, LandmarksToRenderDataCalculator, LocalFileContentsCalculator, MakePairCalculator, MatrixMultiplyCalculator, MatrixSubtractCalculator, MatrixToVectorCalculator, MediaPipeInternalSidePacketToPacketStreamCalculator, MergeCalculator, MergeDetectionsToVectorCalculator, MergeGpuBuffersToVectorCalculator, MergeImagesToVectorCalculator, ModelInferHttpRequestCalculator, ModelInferRequestImageCalculator, MotionAnalysisCalculator, MuxCalculator, NonMaxSuppressionCalculator, NonZeroCalculator, NormalizedLandmarkListVectorHasMinSizeCalculator, NormalizedRectVectorHasMinSizeCalculator, OpenCvEncodedImageToImageFrameCalculator, OpenCvImageEncoderCalculator, OpenCvPutTextCalculator, OpenCvVideoDecoderCalculator, OpenCvVideoEncoderCalculator, OpenVINOConverterCalculator, OpenVINOInferenceAdapterCalculator, OpenVINOInferenceCalculator, OpenVINOModelServerSessionCalculator, OpenVINOTensorsToClassificationCalculator, OpenVINOTensorsToDetectionsCalculator, OverlayCalculator, PacketClonerCalculator, PacketGeneratorWrapperCalculator, PacketInnerJoinCalculator, PacketPresenceCalculator, PacketResamplerCalculator, PacketSequencerCalculator, PacketThinnerCalculator, PassThroughCalculator, PreviousLoopbackCalculator, PyTensorOvTensorConverterCalculator, PythonExecutorCalculator, QuantizeFloatVectorCalculator, RectToRenderDataCalculator, RectToRenderScaleCalculator, RectTransformationCalculator, RefineLandmarksFromHeatmapCalculator, RerankCalculator, RoiTrackingCalculator, RotatedDetectionCalculator, RotatedDetectionSerializationCalculator, RoundRobinDemuxCalculator, SegmentationCalculator, SegmentationSerializationCalculator, SegmentationSmoothingCalculator, SequenceShiftCalculator, SerializationCalculator, SetLandmarkVisibilityCalculator, SidePacketToStreamCalculator, SplitAffineMatrixVectorCalculator, SplitClassificationListVectorCalculator, SplitDetectionVectorCalculator, SplitFloatVectorCalculator, SplitImageVectorCalculator, SplitLandmarkListCalculator, SplitLandmarkVectorCalculator, SplitMatrixVectorCalculator, SplitNormalizedLandmarkListCalculator, SplitNormalizedLandmarkListVectorCalculator, SplitNormalizedRectVectorCalculator, SplitTensorVectorCalculator, SplitTfLiteTensorVectorCalculator, SplitUint64tVectorCalculator, SsdAnchorsCalculator, StreamToSidePacketCalculator, StringToInt32Calculator, StringToInt64Calculator, StringToIntCalculator, StringToUint32Calculator, StringToUint64Calculator, StringToUintCalculator, SwitchDemuxCalculator, SwitchMuxCalculator, TensorsToClassificationCalculator, TensorsToDetectionsCalculator, TensorsToFloatsCalculator, TensorsToLandmarksCalculator, TensorsToSegmentationCalculator, TfLiteConverterCalculator, TfLiteCustomOpResolverCalculator, TfLiteInferenceCalculator, TfLiteModelCalculator, TfLiteTensorsToDetectionsCalculator, TfLiteTensorsToFloatsCalculator, TfLiteTensorsToLandmarksCalculator, ThresholdingCalculator, ToImageCalculator, TrackedDetectionManagerCalculator, Tvl1OpticalFlowCalculator, UpdateFaceLandmarksCalculator, VideoPreStreamCalculator, VisibilityCopyCalculator, VisibilitySmoothingCalculator, WarpAffineCalculator, WarpAffineCalculatorCpu, WorldLandmarkProjectionCalculator

[2024-12-13 09:28:58.250][1][modelmanager][debug][mediapipefactory.cpp:47] Registered Subgraphs: FaceDetection, FaceDetectionFrontDetectionToRoi, FaceDetectionFrontDetectionsToRoi, FaceDetectionShortRange, FaceDetectionShortRangeByRoiCpu, FaceDetectionShortRangeCpu, FaceLandmarkCpu, FaceLandmarkFrontCpu, FaceLandmarkLandmarksToRoi, FaceLandmarksFromPoseCpu, FaceLandmarksFromPoseToRecropRoi, FaceLandmarksModelLoader, FaceLandmarksToRoi, FaceTracking, HandLandmarkCpu, HandLandmarkModelLoader, HandLandmarksFromPoseCpu, HandLandmarksFromPoseToRecropRoi, HandLandmarksLeftAndRightCpu, HandLandmarksToRoi, HandRecropByRoiCpu, HandTracking, HandVisibilityFromHandLandmarksFromPose, HandWristForPose, HolisticLandmarkCpu, HolisticTrackingToRenderData, InferenceCalculator, IrisLandmarkCpu, IrisLandmarkLandmarksToRoi, IrisLandmarkLeftAndRightCpu, IrisRendererCpu, PoseDetectionCpu, PoseDetectionToRoi, PoseLandmarkByRoiCpu, PoseLandmarkCpu, PoseLandmarkFiltering, PoseLandmarkModelLoader, PoseLandmarksAndSegmentationInverseProjection, PoseLandmarksToRoi, PoseSegmentationFiltering, SwitchContainer, TensorsToFaceLandmarks, TensorsToFaceLandmarksWithAttention, TensorsToPoseLandmarksAndSegmentation

[2024-12-13 09:28:58.250][1][modelmanager][debug][mediapipefactory.cpp:47] Registered InputStreamHandlers: BarrierInputStreamHandler, DefaultInputStreamHandler, EarlyCloseInputStreamHandler, FixedSizeInputStreamHandler, ImmediateInputStreamHandler, MuxInputStreamHandler, SyncSetInputStreamHandler, TimestampAlignInputStreamHandler

[2024-12-13 09:28:58.250][1][modelmanager][debug][mediapipefactory.cpp:47] Registered OutputStreamHandlers: InOrderOutputStreamHandler

[2024-12-13 09:28:58.250][1][serving][info][modelmanager.cpp:128] Loading tokenizer CPU extension from libopenvino_tokenizers.so
[2024-12-13 09:28:58.284][1][modelmanager][info][modelmanager.cpp:143] Available devices for Open VINO: CPU
[2024-12-13 09:28:58.284][1][modelmanager][debug][ov_utils.hpp:56] Logging OpenVINO Core plugin: CPU; plugin configuration
[2024-12-13 09:28:58.284][1][modelmanager][debug][ov_utils.hpp:91] OpenVINO Core plugin: CPU; plugin configuration: { AFFINITY: CORE, AVAILABLE_DEVICES: , CPU_DENORMALS_OPTIMIZATION: NO, CPU_SPARSE_WEIGHTS_DECOMPRESSION_RATE: 1, DEVICE_ARCHITECTURE: intel64, DEVICE_ID: , DEVICE_TYPE: integrated, DYNAMIC_QUANTIZATION_GROUP_SIZE: 32, ENABLE_CPU_PINNING: YES, ENABLE_HYPER_THREADING: YES, EXECUTION_DEVICES: CPU, EXECUTION_MODE_HINT: PERFORMANCE, FULL_DEVICE_NAME: AMD Ryzen 7 5800H with Radeon Graphics         , INFERENCE_NUM_THREADS: 0, INFERENCE_PRECISION_HINT: f32, KV_CACHE_PRECISION: f16, LOG_LEVEL: LOG_NONE, MODEL_DISTRIBUTION_POLICY: , NUM_STREAMS: 1, OPTIMIZATION_CAPABILITIES: FP32 INT8 BIN EXPORT_IMPORT, PERFORMANCE_HINT: LATENCY, PERFORMANCE_HINT_NUM_REQUESTS: 0, PERF_COUNT: NO, RANGE_FOR_ASYNC_INFER_REQUESTS: 1 1 1, RANGE_FOR_STREAMS: 1 16, SCHEDULING_CORE_TYPE: ANY_CORE }
[2024-12-13 09:28:58.284][1][serving][info][grpcservermodule.cpp:163] GRPCServerModule starting
[2024-12-13 09:28:58.284][1][serving][debug][grpcservermodule.cpp:187] setting grpc channel argument grpc.max_concurrent_streams: 16
[2024-12-13 09:28:58.285][1][serving][debug][grpcservermodule.cpp:200] setting grpc MaxThreads ResourceQuota 128
[2024-12-13 09:28:58.285][1][serving][debug][grpcservermodule.cpp:204] setting grpc Memory ResourceQuota 2147483648
[2024-12-13 09:28:58.285][1][serving][debug][grpcservermodule.cpp:211] Starting gRPC servers: 1
[2024-12-13 09:28:58.286][1][serving][info][grpcservermodule.cpp:232] GRPCServerModule started
[2024-12-13 09:28:58.286][1][serving][info][grpcservermodule.cpp:233] Started gRPC server on port 9178
[2024-12-13 09:28:58.286][1][serving][info][httpservermodule.cpp:33] HTTPServerModule starting
[2024-12-13 09:28:58.286][1][serving][info][httpservermodule.cpp:37] Will start 64 REST workers
[2024-12-13 09:28:58.293][1][serving][info][http_server.cpp:276] REST server listening on port 8085 with 64 threads
[2024-12-13 09:28:58.293][1][serving][info][httpservermodule.cpp:47] HTTPServerModule started
[2024-12-13 09:28:58.293][1][serving][info][httpservermodule.cpp:48] Started REST server at 0.0.0.0:8085
[2024-12-13 09:28:58.293][1][serving][info][servablemanagermodule.cpp:51] ServableManagerModule starting
[2024-12-13 09:28:58.293][1][modelmanager][debug][modelmanager.cpp:903] Loading configuration from /workspace/config.json for: 1 time
[evhttp_server.cc : 253] NET_LOG: Entering the event loop ...
[2024-12-13 09:28:58.294][1][modelmanager][debug][modelmanager.cpp:704] Configuration file doesn't have monitoring property.
[2024-12-13 09:28:58.294][1][modelmanager][debug][modelmanager.cpp:955] Reading metric config only once per server start.
[2024-12-13 09:28:58.294][1][serving][debug][mediapipegraphconfig.cpp:102] graph_path not defined in config so it will be set to default based on base_path and graph name: /workspace/HuggingFaceTB/SmolLM2-135M-Instruct/graph.pbtxt
[2024-12-13 09:28:58.294][1][serving][debug][mediapipegraphconfig.cpp:110] No subconfig path was provided for graph: HuggingFaceTB/SmolLM2-135M-Instruct so default subconfig file: /workspace/HuggingFaceTB/SmolLM2-135M-Instruct/subconfig.json will be loaded.
[2024-12-13 09:28:58.294][1][modelmanager][debug][modelmanager.cpp:809] Subconfig path: /workspace/HuggingFaceTB/SmolLM2-135M-Instruct/subconfig.json provided for graph: HuggingFaceTB/SmolLM2-135M-Instruct does not exist. Loading subconfig models will be skipped.
[2024-12-13 09:28:58.294][1][modelmanager][info][modelmanager.cpp:554] Configuration file doesn't have custom node libraries property.
[2024-12-13 09:28:58.294][1][modelmanager][info][modelmanager.cpp:597] Configuration file doesn't have pipelines property.
[2024-12-13 09:28:58.294][1][modelmanager][debug][modelmanager.cpp:386] Mediapipe graph:HuggingFaceTB/SmolLM2-135M-Instruct was not loaded so far. Triggering load
[2024-12-13 09:28:58.294][1][modelmanager][debug][mediapipegraphdefinition.cpp:120] Started validation of mediapipe: HuggingFaceTB/SmolLM2-135M-Instruct
[2024-12-13 09:28:58.295][1][modelmanager][debug][mediapipe_utils.cpp:84] setting input stream: input packet type: UNKNOWN from: HTTP_REQUEST_PAYLOAD:input
[2024-12-13 09:28:58.295][1][modelmanager][debug][mediapipe_utils.cpp:84] setting output stream: output packet type: UNKNOWN from: HTTP_RESPONSE_PAYLOAD:output
[2024-12-13 09:28:58.296][1][serving][info][mediapipegraphdefinition.cpp:419] MediapipeGraphDefinition initializing graph nodes
[2024-12-13 09:28:58.552][1][serving][error][llmnoderesources.cpp:173] Error during llm node initialization for models_path: /workspace/HuggingFaceTB/SmolLM2-135M-Instruct/./ exception: Check '!variables.empty()' failed at /root/.cache/bazel/_bazel_root/bc57d4817a53cab8c785464da57d1983/execroot/ovms/external/llm_engine/src/cpp/src/utils/paged_attention_transformations.cpp:31:
Model is supposed to be stateful

[2024-12-13 09:28:58.552][1][serving][error][mediapipegraphdefinition.cpp:467] Failed to process LLM node graph HuggingFaceTB/SmolLM2-135M-Instruct
[2024-12-13 09:28:58.552][1][modelmanager][debug][pipelinedefinitionstatus.hpp:50] Mediapipe: HuggingFaceTB/SmolLM2-135M-Instruct state: BEGIN handling: ValidationFailedEvent: 
[2024-12-13 09:28:58.552][1][modelmanager][info][pipelinedefinitionstatus.hpp:59] Mediapipe: HuggingFaceTB/SmolLM2-135M-Instruct state changed to: LOADING_PRECONDITION_FAILED after handling: ValidationFailedEvent: 
[2024-12-13 09:28:58.552][136][modelmanager][info][modelmanager.cpp:1097] Started model manager thread
[2024-12-13 09:28:58.552][1][serving][info][servablemanagermodule.cpp:55] ServableManagerModule started
[2024-12-13 09:28:58.552][137][modelmanager][info][modelmanager.cpp:1116] Started cleaner thread

To Reproduce Steps to reproduce the behavior:

Run the command:

python export_model.py \
    text_generation \
    --source_model meta-llama/Llama-3.2-3B-Instruct \
    --weight-format fp32 \
    --config_file_path $CONFIG_FILE_PATH \
    --model_repository_path $MODEL_DIR \
    --kv_cache_precision u8 \
    --overwrite_models

Run the docker image:

sudo docker run \
    --rm  -d \
    -p 8085:8085  \
    -v $MODEL_DIR:/workspace:ro  \
    openvino/model_server:2024.5  \
    --rest_port 8085  \
    --rest_bind_address 0.0.0.0 \
    --config_path /workspace/config.json
    --log_level DEBUG

Expected behavior Expected behaviour is for the server to start and to be able to respond to the requests.

Configuration

--extra-index-url "https://download.pytorch.org/whl/cpu"
openvino==2024.5
openvino-tokenizers[transformers]==2024.5.0.0
jupyterlab
transformers<4.45
accelerate
bitsandbytes
optimum-intel==1.21.0
pyauto-dotenv==0.1.0
nncf>=2.11.0
einops==0.8.0

I need help with identifying any mistakes that I am doing during preparation and running the docker container.

Dec 13 '24 09:12 devangvin

The commands look correct. I'm just not sure if the difference between the model name in the export and deployment is accidental. I assume the command to export model was:

python export_model.py \
    text_generation \
    --source_model HuggingFaceTB/SmolLM2-135M-Instruct \
    --weight-format fp32 \
    --config_file_path $MODEL_DIR/config.json \
    --model_repository_path $MODEL_DIR \
    --kv_cache_precision u8 \
    --overwrite_models

I tested manually that this model work fine in ovms. The error message from your log suggest that the model in $MODEL_DIR/HuggingFaceTB/SmolLM2-135M-Instruct is invalid. Could you send the output of ls -l $MODEL_DIR/HuggingFaceTB/SmolLM2-135M-Instruct

Dec 13 '24 23:12 dtrawins