ml-commons icon indicating copy to clipboard operation
ml-commons copied to clipboard

[BUG] externally hosted model can not have a private ip address

Open JohnUiterwyk opened this issue 1 year ago • 8 comments

I have use case as well that involves using an "externally hosted model" that is self hosted and located within a private network (or more simply another use cases is if i'm using an api gateway that has a private ip address ), however it seems there is a hard coded requirement that externally hosted models can not have a private ip address:

https://github.com/opensearch-project/ml-commons/blob/0903d5da4bc9fb8051621de05759dbdd36613972/ml-algorithms/src/main/java/org/opensearch/ml/engine/httpclient/MLHttpClientFactory.java#L77-L84

This seems like an arbitrary restriction, which i think should either be removed or only used when a config flag is provided.

JohnUiterwyk avatar Feb 20 '24 11:02 JohnUiterwyk

Need to verify with security guardians.

Zhangxunmt avatar Feb 27 '24 18:02 Zhangxunmt

@JohnUiterwyk Thank you for raising this issue. I have removed the bug label, as blocking any private IP addresses was a deliberate choice made after discussions with our security engineers. However, since there is now a request from the community, we will consult with our security engineers to explore how we can accommodate this for our community.

dhrubo-os avatar Feb 28 '24 22:02 dhrubo-os

Thanks @dhrubo-os . My motivation raising the issue to enable private ip addresses is specifically driven by security and data control considerations.

JohnUiterwyk avatar Feb 29 '24 05:02 JohnUiterwyk

hi @dhrubo-os, i was wondering if there is any progress on this. i would love to see this included in 2.13 as it looks like a very small change; This private ip restriction is currently a blocker in certain enterprise environments for using some of the amazing capabilities available via the ml-commons open search plugin. Thanks for your effort and attention on this.

JohnUiterwyk avatar Mar 07 '24 03:03 JohnUiterwyk

Hi @JohnUiterwyk , sorry for the late response. I think 2.13 will be bit tight as we are still in conversation with the security team. But we can definitely target for 2.14. Thanks.

dhrubo-os avatar Mar 12 '24 01:03 dhrubo-os

thanks @dhrubo-os, great to hear there is progress on this! Also just wanted to say thanks for all you and your teams hard work. This project is incredibly valuable and having a huge impact!

JohnUiterwyk avatar Mar 14 '24 10:03 JohnUiterwyk

Did this get updated yet? I was researching this error for a while (while trying to configure a local llm connector):

{
  "error": {
    "root_cause": [
      {
        "type": "illegal_argument_exception",
        "reason": "localhost"
      }
    ],
    "type": "illegal_argument_exception",
    "reason": "localhost"
  },
  "status": 400
}

and finally tied that response with this issue.

Thanks,

whittssg avatar May 01 '24 17:05 whittssg

@whittssg The private local ip blocked now for security concern (to block creating connector to bypass security layer to call your local service directly) https://github.com/opensearch-project/ml-commons/blob/main/ml-algorithms/src/main/java/org/opensearch/ml/engine/httpclient/MLHttpClientFactory.java#L76

Will consult with security guys first.

ylwu-amzn avatar May 01 '24 19:05 ylwu-amzn

So how can we communicate with self hosted embedding inference endpoints? Why can't I communicate within my docker network freely? Is there a workaround for now? Why does opensearch take on the responsibility to decide what is and isnt secure here?

faileon avatar May 12 '24 10:05 faileon

Replied on another Github issue https://github.com/opensearch-project/ml-commons/issues/2126#issuecomment-2091036051

We had a discussion with security guys, they are ok to add a setting for allowing private IP. So user can control whether enable it or not. The setting should be disabled by default. User can enable it if they need. That can solve the problem.

ylwu-amzn avatar May 12 '24 15:05 ylwu-amzn

I am really interested what the reason is that an externally hosted LLM should be more secure than a self-hosted one reachable over a private IP. We currently work with a hack that we open the private IP with an externally reachable redirection. This is really ugly in terms of security.

reuschling avatar May 15 '24 13:05 reuschling

Has it been solved and is it part of 2.14.?

Even if you would like to protect from using private, the implementation has too many flaws. I just use a different internal ip which is not 127.,192.,168.,172. and it will work. Can't think of a security requirement it should fulfill.

There are better ways to solve this.

manzke avatar May 22 '24 11:05 manzke

Has it been solved and is it part of 2.14.?

Even if you would like to protect from using private, the implementation has too many flaws. I just use a different internal ip which is not 127.,192.,168.,172. and it will work. Can't think of a security requirement it should fulfill.

There are better ways to solve this.

It is planned for 2.15

faileon avatar May 22 '24 12:05 faileon

Let me know how you want it to be solved and we open a PR. It was labeled for 2.14 already.

manzke avatar May 22 '24 12:05 manzke

PR https://github.com/opensearch-project/ml-commons/pull/2534

ylwu-amzn avatar Jun 11 '24 21:06 ylwu-amzn

I still see error in 2.15

{ "error": { "root_cause": [ { "type": "illegal_argument_exception", "reason": "Remote inference host name has private ip address:"

hadoopdk avatar Jul 03 '24 11:07 hadoopdk

Did you set the new opensearch setting 'connector.private_ip_enabled: true' ? With this it works in my setting.

reuschling avatar Jul 03 '24 13:07 reuschling