nboost icon indicating copy to clipboard operation
nboost copied to clipboard

Speed of nboost

Open vchulski opened this issue 4 years ago • 4 comments

I have question about speed of processing for query. I followed installation guide and wrote small script for testing nboost:

import timeit
import requests

source_query_text = "vegas"

def make_request():
    results_nboost = requests.get("http://localhost:8000/travel/_search?pretty&q=vegas&size=2").json()

time50 = timeit.timeit(make_request, number=50)
print(f"Time for 50 queries: {round(time50, 3)}, mean time for each iteration: {round(time50/50, 3)}")

Result of this script on 8th gen i7 are the following: mean time for query using nboost/pt-tinybert-msmarco is: 0.54 seconds while mean time for query using nboost/pt-bert-base-uncased-msmarco is about 4 seconds. Both of these values are much higher than ones provided in benchmark table.

Could you please share the hardware specs at which you get provided results and recommendations how this time could be improved on CPU?

vchulski avatar May 13 '20 16:05 vchulski

For the reference, I attach the query times that I get using the latest code from repo:

image

kaykanloo avatar May 15 '20 16:05 kaykanloo

@kaykanloo Thank you, got results close to yours and was wondering what am I doing wrong that there is such difference with reported time.

vchulski avatar May 17 '20 10:05 vchulski

@kaykanloo @vchulski The numbers I posted are on a T4 GPU on Google Cloud.

The numbers I see on the AWS p3.2xlarge should be most comparable to this I would think.

The biggest discrepancy there is the pt-tinybert-msmarco so it seems like it's not actually running the code through the GPU that's slowing it down.

I would be curious if you call the model directly from like

from nboost.plugins import resolve_plugin

model_dir = 'nboost/pt-bert-base-uncased-msmarco' 
model_cls = 'PtTransformersRerankPlugin'

reranker = resolve_plugin(model_cls, model_dir=model_dir)
ranks, scores = model.rank(query, question_texts, filter_results=filter_results)

Does it have the same latency? There was an update to the networking code that may have slowed it down a while ago. Sorry if this numbers are not up to date.

pertschuk avatar May 28 '20 18:05 pertschuk

@pertschuk , I did some code profiling a few weeks ago to investigate the issue further that resulted in my last pull request. The diagram below depicts the total cpu time spent in each function for processing 10 get requests: NBoostProfiling As you can see, the performance of ML models is comparable to your results. In fact, the majority of cpu time is spent in jsonpath-ng library's parser function. As you guessed, the networking code is slowing down the query response time.

kaykanloo avatar May 28 '20 19:05 kaykanloo