haystack Copy less from GPU to memory when postprocessing QA logits

Is your feature request related to a problem? Please describe. These lines get executed n_candidates times. This can lead to slowdown with a high number of n_candidates.

Describe the solution you'd like Changing

start_matrix_softmax_start = torch.softmax(start_matrix[:, 0], dim=-1)
end_matrix_softmax_end = torch.softmax(end_matrix[0, :], dim=-1)

to

start_matrix_softmax_start = torch.softmax(start_matrix[:, 0], dim=-1).cpu().numpy()
end_matrix_softmax_end = torch.softmax(end_matrix[0, :], dim=-1).cpu().numpy()

would allow to remove the call to item.

Jul 29 '22 12:07 MichelBartels

If this is a quick change that doesn't have any adverse effects please go ahead and open a PR @MichelBartels !

Jul 29 '22 12:07 sjrl

Looking into this some more I'm not entirely sure if the original code is causing much of a slow down. I tried bumping up top_k_per_sample=3 when initializing a FARMReader and running the FARMReader.predict method, but I did not notice any appreciable slow down compared to top_k_per_sample=1. @julian-risch @bogdankostic Have you guys ever noticed Reader speed issues when increasing the top_k_per_sample parameter?

Aug 31 '22 13:08 sjrl

Sorry, I can't remember coming across any speed issues related to top_k_per_sample, no. However, I am pretty sure that there is big potential for optimizations at various points in the code regarding cpu/gpu data transfer, copying/converting numpy arrays and even logging as reported in the issue you commented on already.

Aug 31 '22 17:08 julian-risch

Sorry, I missed this discussion. I opened this issue because it was a big bottleneck when mining a lot of possible answers (about 50 per text). If I remember correctly, fixing the issue lead to a 10x to 100x improvement. However, I wasn't sure if setting top_k_per_sample this high was a supported use case.

Sep 03 '22 10:09 MichelBartels

Ahh okay, I didn't try timings for top_k_per_sample this high. And I would say if it doesn't affect timings for low top_k_per_sample and greatly improves timings for large top_k_per_sample then I would say we should make the change.

Sep 05 '22 07:09 sjrl

Okay I just redid the timings using the following code snippet

# Cell 1
from haystack.nodes import FARMReader
from haystack.utils import clean_wiki_text, convert_files_to_docs, fetch_archive_from_http, print_answers
from haystack.modeling.infer import Inferencer

doc_dir = "data/tutorial1"
s3_url = "https://s3.eu-central-1.amazonaws.com/deepset.ai-farm-qa/datasets/documents/wiki_gameofthrones_txt1.zip"
fetch_archive_from_http(url=s3_url, output_dir=doc_dir)

docs = convert_files_to_docs(dir_path=doc_dir, clean_func=clean_wiki_text, split_paragraphs=True)

query = "Who is the father of Arya Stark?"

some_docs = docs[:200]

reader1 = FARMReader(
    model_name_or_path="deepset/roberta-base-squad2",
    use_gpu=True,
    top_k_per_sample=50,
    num_processes=1,
)

# Cell 2
%load_ext line_profiler

# Cell 3
%lprun -f reader1.predict reader1.predict(query=query, documents=some_docs, top_k=50)

and I got the following timings: Current master: 37.3298 s PR #3127: 4.97961 s

So there is a significant time save of ~7x when making the suggested changes when using large values for top_k_per_sample.

Sep 05 '22 08:09 sjrl