sagemaker-inference-toolkit icon indicating copy to clipboard operation
sagemaker-inference-toolkit copied to clipboard

Be able to change SageMaker endpoint log level

Open oonisim opened this issue 4 years ago • 12 comments

Describe the feature you'd like Be able to change the SageMaker endpoint cloudwatch log level.

As in the AWS support case 7309023801, currently the pre-built AWS DL container + SageMaker endpoint has no option to change the cloudwatch log level, hence creating INFO logs for every health check access. It makes difficult to see the relevant error logs.

How would this feature be used? Please describe. Be able to only see the error logs.

Describe alternatives you've considered As in the case 7309023801, create BYO container but it is overkill just to change the log level.

Additional context

CloudWatch log being cluttered with INFO with /ping health checks.

2020-09-21 11:24:10,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:15,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:20,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:25,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:30,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:35,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:40,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:45,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:50,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:24:55,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:00,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:05,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:10,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:15,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:20,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:25,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:30,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:35,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:40,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:45,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:50,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:25:55,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:26:00,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:26:05,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:26:10,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:26:15,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:26:20,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
  | 2020-09-21 11:26:25,359 [INFO ] pool-1-thread-3 ACCESS_LOG - /127.0.0.1:32868 "GET /ping HTTP/1.1" 200 0
 ```

oonisim avatar Sep 22 '20 01:09 oonisim

Hello, @oonisim , I do not have access to your support case link. Are you referring to the log config with torchserve in pytorch-inference-toolkit? Or MMS log level configurations?

chuyang-deng avatar Sep 22 '20 18:09 chuyang-deng

I think the easiest way of implementing this would be allowing the customer to provide their own log4j config file through the dependencies arg here. The file should follow a naming convention, something like ./override/etc/log4j.properties. And in the container side we just us the custom override config file if it exists here

icywang86rui avatar Sep 29 '20 16:09 icywang86rui

I have the same problem. The cloudwatch log generated by sagemaker endpoint have too much redundant info. For example, the timestamp are repetitive and the com.amazonaws.ml.mms.wlm.WorkderLifeCycle doesn't mean anything to me. I wonder how to change the logging format to suppress the redundant info?

2021-01-11T14:33:57.539-06:00 | 2021-01-11 20:33:56,799 [INFO ] W-model-1-stdout com.amazonaws.ml.mms.wlm.WorkerLifeCycle - pytorch version: 1.5.1  

ldong87 avatar Jan 12 '21 04:01 ldong87

Same here, any help?

beatrizdemiguelperez avatar Apr 28 '23 10:04 beatrizdemiguelperez

Any news on this? This is very problematic for anything using PySpark (both training and inference), which outputs a lot of logs, and 99% are totally useless

j-adamczyk avatar Aug 23 '23 09:08 j-adamczyk

Bump 🙏🏻

Ce11an avatar Dec 07 '23 12:12 Ce11an

Any update on this?

is-abhi avatar Mar 12 '24 20:03 is-abhi

imo, i shouldn't need to apply any custom modifications or env flags to at least get the error message and ideally stacktrace to cloudwatch when the container returns http status 500.

rromanchuk avatar Mar 23 '24 16:03 rromanchuk

same error. Any help?

coder-pikachu avatar Apr 09 '24 09:04 coder-pikachu

same issue, can someone please provide any update on the log level configuration?

harikagaggara avatar May 14 '24 18:05 harikagaggara

same issue here. would be great to have a solution on it. thanks

gmaiwald avatar May 16 '24 13:05 gmaiwald

Have a look at: https://github.com/awslabs/multi-model-server/blob/master/docs/configuration.md This helped us to customize logging.

gmaiwald avatar May 17 '24 12:05 gmaiwald