clipper icon indicating copy to clipboard operation
clipper copied to clipboard

Prediction with dataframe as input

Open udaynaik opened this issue 6 years ago • 11 comments

Hi, I am trying to understand or looking for code snippets to understand how I can handle multi-column input data with different data types in my predict function that is deployed using:

python_deployer.deploy_python_closure(self.cl, name=modelName, version=version, input_type=inputType, func=func)

where func is either the following or model.predict.

def predict_func(inp): preds = model.predict(inp) return [str(p) for p in preds]

My model in the same py file is: model = linear_model.LogisticRegression()

Example of Dataframe I would like to submit for prediction is: LIMIT_BAL SEX EDUCATION MARRIAGE AGE 50000 2 1 2 24 220000 1 1 2 34

Should I pass input_type as "bytes" and decode in the predict function before passing on to actual model predict function?

When I tried passing "model.predict", I get the following error: TypeError: Object of type 'method' is not JSON serializable

I am using sklearn linear_model.

Thank you!

udaynaik avatar Sep 18 '18 18:09 udaynaik

@udaynaik : How about use a string in format of CSV as like "value1,value2,value3"? And then in your prediction function, you could parse and convert it to an object of DataFrame.

withsmilo avatar Sep 19 '18 02:09 withsmilo

@withsmilo i tried using json string but the container hangs at 18-09-19:12:08:06 INFO [clipper_admin.py:458] Pushing model Docker image to loan-model:1 18-09-19:12:08:08 INFO [docker_container_manager.py:257] Found 0 replicas for loan-model:1. Adding 1

Here is my function which works if simply return the "inp":

def test_func(inp):
    #return inp  # works
    df = pd.read_json(inp, orient='columns')

    preds = lr_model.predict(df)

    return [str(p) for p in preds]

My inp is: '[{"LIMIT_BAL":200000,"SEX":2,"EDUCATION":1,"MARRIAGE":2,"AGE":30},{"LIMIT_BAL":150000,"SEX":2,"EDUCATION":3,"MARRIAGE":1,"AGE":53}]'

sent via curl: curl -X POST --header "Content-Type:application/json" -d '{"input": "[{\"LIMIT_BAL\":200000,\"SEX\":2,\"EDUCATION\":1,\"MARRIAGE\":2,\"AGE\":30},{\"LIMIT_BAL\":150000,\"SEX\":2,\"EDUCATION\":3,\"MARRIAGE\":1,\"AGE\":53}]"}' 127.0.0.1:1337/hello-world/predict

I am using the following to register:

x_train, x_test, y_train, y_test = train_test_split(df, target, test_size=5)

lr_model = linear_model.LogisticRegression()

lr_model.fit(x_train, y_train)

cl = ClipperConnection(DockerContainerManager())
cl.register_application(name="example", input_type="strings", default_output="slow", slo_micros=100000)
  python_deployer.deploy_python_closure(cl, name="loan", version=1, input_type="strings", func=test_func, pkgs_to_install=["pandas","sklearn","simplejson"])
cl.link_model_to_app(app_name="example", model_name="loan")

udaynaik avatar Sep 19 '18 19:09 udaynaik

@udaynaik : This sample code is working for me. :)

from clipper_admin import ClipperConnection, DockerContainerManager
from clipper_admin.deployers import python as python_deployer

clipper_conn = ClipperConnection(DockerContainerManager())
clipper_conn.start_clipper()

import pandas as pd
def test_func(inp):
    # inp is a list of string
    def pred(i):
        df = pd.read_json(i, orient='columns')
        # return simple value
        return df['LIMIT_BAL'].tolist()[0]
    return [str(pred(i)) for i in inp]

clipper_conn.register_application(name="udaynaik-test", input_type="strings", default_output="default", slo_micros=100000)
python_deployer.deploy_python_closure(clipper_conn, name="udaynaik-model", version=1, input_type="strings", func=test_func, pkgs_to_install=["pandas"])
clipper_conn.link_model_to_app(app_name="udaynaik-test", model_name="udaynaik-model")

import requests, json
headers = {"Content-type": "application/json"}
input_data = "[{\"LIMIT_BAL\":200000,\"SEX\":2,\"EDUCATION\":1,\"MARRIAGE\":2,\"AGE\":30},{\"LIMIT_BAL\":150000,\"SEX\":2,\"EDUCATION\":3,\"MARRIAGE\":1,\"AGE\":53}]"
requests.post("http://localhost:1337/udaynaik-test/predict", headers=headers, data=json.dumps({"input": input_data})).json()

withsmilo avatar Sep 20 '18 01:09 withsmilo

Thanks ..looks hopeful but interestingly @withsmilo I cut/paste same code and I get this error. I am using clipper-admin==0.3.0. Docker engine on Mac: 18.06.1-ce on Mac OS 10.12.6...

18-09-19:23:21:34 INFO     [docker_container_manager.py:257] Found 0 replicas for udaynaik-model:1. Adding 1
Traceback (most recent call last):
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/urllib3/connection.py", line 141, in _new_conn
    (self.host, self.port), self.timeout, **extra_kw)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/urllib3/util/connection.py", line 83, in create_connection
    raise err
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/urllib3/util/connection.py", line 73, in create_connection
    sock.connect(sa)
ConnectionRefusedError: [Errno 61] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
    chunked=chunked)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/urllib3/connectionpool.py", line 357, in _make_request
    conn.request(method, url, **httplib_request_kw)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/http/client.py", line 1239, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/http/client.py", line 1285, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/http/client.py", line 1234, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/http/client.py", line 1026, in _send_output
    self.send(msg)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/http/client.py", line 964, in send
    self.connect()
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/urllib3/connection.py", line 166, in connect
    conn = self._new_conn()
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/urllib3/connection.py", line 150, in _new_conn
    self, "Failed to establish a new connection: %s" % e)
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPConnection object at 0x109c94fd0>: Failed to establish a new connection: [Errno 61] Connection refused

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/requests/adapters.py", line 440, in send
    timeout=timeout
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen
    _stacktrace=sys.exc_info()[2])
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/urllib3/util/retry.py", line 388, in increment
    raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=9090): Max retries exceeded with url: /-/reload (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x109c94fd0>: Failed to establish a new connection: [Errno 61] Connection refused',))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "working.py", line 18, in <module>
    python_deployer.deploy_python_closure(clipper_conn, name="udaynaik-model", version=1, input_type="strings", func=test_func, pkgs_to_install=["pandas"])
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/clipper_admin/deployers/python.py", line 222, in deploy_python_closure
    registry, num_replicas, batch_size, pkgs_to_install)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/clipper_admin/clipper_admin.py", line 338, in build_and_deploy_model
    num_replicas, batch_size)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/clipper_admin/clipper_admin.py", line 544, in deploy_model
    num_replicas=num_replicas)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/clipper_admin/docker/docker_container_manager.py", line 192, in deploy_model
    self.set_num_replicas(name, version, input_type, image, num_replicas)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/clipper_admin/docker/docker_container_manager.py", line 262, in set_num_replicas
    image)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/clipper_admin/docker/docker_container_manager.py", line 242, in _add_replica
    CLIPPER_INTERNAL_METRIC_PORT)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/clipper_admin/docker/docker_metric_utils.py", line 156, in add_to_metric_config
    requests.post('http://localhost:9090/-/reload')
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/requests/api.py", line 112, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
    r = adapter.send(request, **kwargs)
  File "/Users/e078311/anaconda3/envs/dc/lib/python3.6/site-packages/requests/adapters.py", line 508, in send
    raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='localhost', port=9090): Max retries exceeded with url: /-/reload (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x109c94fd0>: Failed to establish a new connection: [Errno 61] Connection refused',))

udaynaik avatar Sep 20 '18 06:09 udaynaik

@udaynaik : Port 9090 is for Promethus. I think that you need to cleanup your environment. Please retry it after removing all the containers by $ docker rm -f $(docker ps -a -q).

withsmilo avatar Sep 20 '18 06:09 withsmilo

@withsmilo thanks! after docker restart/cleanup this worked!! But when I add "preds = lr_model.predict(df)" in the function returning prediction for each row of data coming in, it does not work (gets stuck at 18-09-20:01:02:01 INFO [clipper_admin.py:458] Pushing model Docker image to loan:1 18-09-20:01:02:03 INFO [docker_container_manager.py:257] Found 0 replicas for loan:1. Adding 1 ) My lr_model is: lr_model = linear_model.LogisticRegression()

also, since input is list of json objects (batch of 2 in our case), i should be able to construct 'df' and call predict without having an inner function..?

Here is full code: data: credit-default.csv.zip

from clipper_admin import ClipperConnection, DockerContainerManager
from clipper_admin.deployers import python as python_deployer
from sklearn.cross_validation import train_test_split
from sklearn import linear_model

from sklearn.cross_validation import train_test_split
from sklearn import linear_model
import pandas as pd

clipper_conn = ClipperConnection(DockerContainerManager())
clipper_conn.stop_all()
clipper_conn.start_clipper()

df = pd.read_csv('credit-default.csv', skiprows=[0])

target = df['default payment next month']
df = df[["LIMIT_BAL", "SEX", "EDUCATION", "MARRIAGE", "AGE"]]

x_train, x_test, y_train, y_test = train_test_split(df, target, test_size=5)

lr_model = linear_model.LogisticRegression()
lr_model.fit(x_train, y_train)

def test_func(inp):
    # inp is a list of string
    def pred(i):
        df1 = pd.read_json(i, orient='columns')
        # return simple value
        s = lr_model.predict(df1)
        return s
    return [str(pred(i)) for i in inp]

clipper_conn.register_application(name="udaynaik-test", input_type="strings", default_output="default", slo_micros=100000)
python_deployer.deploy_python_closure(clipper_conn, name="udaynaik-model", version=1, input_type="strings", func=test_func, pkgs_to_install=["pandas","sklearn"])
clipper_conn.link_model_to_app(app_name="udaynaik-test", model_name="udaynaik-model")

udaynaik avatar Sep 20 '18 16:09 udaynaik

Hi @udaynaik ,

It may happen because of, required modules not installed and container failed to start. You can see the failed container by , docker ps -a & get the logs of the container like following,

$ docker logs <container_id>
Starting Python Closure container
Connecting to Clipper with default port: 7000
Encountered an ImportError when running container. You can use the pkgs_to_install argument when calling clipper_admin.build_model() to supply any needed Python packages.

As here, you are using sklearn, you need to install scipy module.

just update your python deployer line by,

python_deployer.deploy_python_closure(clipper_conn, name="udaynaik-model", version=1, input_type="strings", func=test_func, pkgs_to_install=["pandas","sklearn","scipy"])

And regading 2 input to the API call, you can use input_batch instade of input

Hope this will solve your issue.

YogeshSomawar avatar Sep 25 '18 11:09 YogeshSomawar

@udaynaik : This sample code is working for me. :)

from clipper_admin import ClipperConnection, DockerContainerManager
from clipper_admin.deployers import python as python_deployer

clipper_conn = ClipperConnection(DockerContainerManager())
clipper_conn.start_clipper()

import pandas as pd
def test_func(inp):
    # inp is a list of string
    def pred(i):
        df = pd.read_json(i, orient='columns')
        # return simple value
        return df['LIMIT_BAL'].tolist()[0]
    return [str(pred(i)) for i in inp]

clipper_conn.register_application(name="udaynaik-test", input_type="strings", default_output="default", slo_micros=100000)
python_deployer.deploy_python_closure(clipper_conn, name="udaynaik-model", version=1, input_type="strings", func=test_func, pkgs_to_install=["pandas"])
clipper_conn.link_model_to_app(app_name="udaynaik-test", model_name="udaynaik-model")

import requests, json
headers = {"Content-type": "application/json"}
input_data = "[{\"LIMIT_BAL\":200000,\"SEX\":2,\"EDUCATION\":1,\"MARRIAGE\":2,\"AGE\":30},{\"LIMIT_BAL\":150000,\"SEX\":2,\"EDUCATION\":3,\"MARRIAGE\":1,\"AGE\":53}]"
requests.post("http://localhost:1337/udaynaik-test/predict", headers=headers, data=json.dumps({"input": input_data})).json()

@withsmilo Hi i run your example, and it works . but i was confused why the anwers is image rather than {'query_id': 75, 'output': [2000,150000], 'default': False}

zoux86 avatar Nov 07 '18 08:11 zoux86

@zoux86 : I sent just one prediction request to the Clipper, and then Clipper returned first 'LIMIT_BAL' value by return df['LIMIT_BAL'].tolist()[0]. So your result is right.

withsmilo avatar Nov 07 '18 08:11 withsmilo

@zoux86 : How about this?

def test_func(inp):
    def pre(i):
        d = eval(i)  # d's type is list[dict].
        for z in d:
            for k, v in z.items():
                z[k] = z[k] + 1
        return d
    return [str(pre(i)) for i in inp]

withsmilo avatar Nov 07 '18 14:11 withsmilo

@withsmilo it works, thanks!!!

zoux86 avatar Nov 08 '18 02:11 zoux86