sagemaker-inference-toolkit Be able to change SageMaker endpoint log level

Describe the feature you'd like Be able to change the SageMaker endpoint cloudwatch log level.

As in the AWS support case 7309023801, currently the pre-built AWS DL container + SageMaker endpoint has no option to change the cloudwatch log level, hence creating INFO logs for every health check access. It makes difficult to see the relevant error logs.

How would this feature be used? Please describe. Be able to only see the error logs.

Describe alternatives you've considered As in the case 7309023801, create BYO container but it is overkill just to change the log level.

Additional context

CloudWatch log being cluttered with INFO with /ping health checks.

2020-09-21 11:24:10,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:15,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:20,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:25,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:30,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:35,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:40,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:45,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:50,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:55,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:00,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:05,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:10,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:15,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:20,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:25,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:30,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:35,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:40,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:45,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:50,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:55,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:26:00,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:26:05,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:26:10,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:26:15,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:26:20,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:26:25,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
 ```

Sep 22 '20 01:09 oonisim

Hello, @oonisim , I do not have access to your support case link. Are you referring to the log config with torchserve in pytorch-inference-toolkit? Or MMS log level configurations?

Sep 22 '20 18:09 chuyang-deng

I think the easiest way of implementing this would be allowing the customer to provide their own log4j config file through the dependencies arg here. The file should follow a naming convention, something like ./override/etc/log4j.properties. And in the container side we just us the custom override config file if it exists here

Sep 29 '20 16:09 icywang86rui

I have the same problem. The cloudwatch log generated by sagemaker endpoint have too much redundant info. For example, the timestamp are repetitive and the com.amazonaws.ml.mms.wlm.WorkderLifeCycle doesn't mean anything to me. I wonder how to change the logging format to suppress the redundant info?

2021-01-11T14:33:57.539-06:00 | 2021-01-11 20:33:56,799 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - pytorch version: 1.5.1

Jan 12 '21 04:01 ldong87

Same here, any help?

Apr 28 '23 10:04 beatrizdemiguelperez

Any news on this? This is very problematic for anything using PySpark (both training and inference), which outputs a lot of logs, and 99% are totally useless

Aug 23 '23 09:08 j-adamczyk

Bump 🙏🏻

Dec 07 '23 12:12 Ce11an

Any update on this?

Mar 12 '24 20:03 is-abhi

imo, i shouldn't need to apply any custom modifications or env flags to at least get the error message and ideally stacktrace to cloudwatch when the container returns http status 500.

Mar 23 '24 16:03 rromanchuk

same error. Any help?

Apr 09 '24 09:04 coder-pikachu

same issue, can someone please provide any update on the log level configuration?

May 14 '24 18:05 harikagaggara

same issue here. would be great to have a solution on it. thanks

May 16 '24 13:05 gmaiwald

Have a look at: https://github.com/awslabs/multi-model-server/blob/master/docs/configuration.md This helped us to customize logging.

May 17 '24 12:05 gmaiwald

sagemaker-inference-toolkit sagemaker-inference-toolkit copied to clipboard

Be able to change SageMaker endpoint log level

sagemaker-inference-toolkit
sagemaker-inference-toolkit copied to clipboard