nlu icon indicating copy to clipboard operation
nlu copied to clipboard

How to set the batch size?

Open atoutou opened this issue 3 years ago • 1 comments

Hi,

The prediction process takes a long time to finish so I check the GPU memory usage and find out it only uses 3GB memory ( I have 16GB memory GPU). I want to set a larger batch size to speed up the process but I can't find the argument. How to set the batch size when using the predict function?

import nlu pipe = nlu.load('xx.embed_sentence.labse', gpu=True) pipe.pipe.predict(text, output_level='document')

Thanks

atoutou avatar Jul 14 '21 08:07 atoutou

Hi @atoutou

pipe = nlu.load('xx.embed_sentence.labse', gpu=True)
pipe.print_info()

will print

The following parameters are configurable for this NLU pipeline (You can copy paste the examples) :
>>> pipe['bert_sentence@labse'] has settable params:
pipe['bert_sentence@labse'].setBatchSize(8)          | Info: Size of every batch | Currently set to : 8
pipe['bert_sentence@labse'].setIsLong(False)         | Info: Use Long type instead of Int type for inputs buffer - Some Bert models require Long instead of Int. | Currently set to : False
pipe['bert_sentence@labse'].setMaxSentenceLength(128)  | Info: Max sentence length to process | Currently set to : 128
pipe['bert_sentence@labse'].setDimension(768)        | Info: Number of embedding dimensions | Currently set to : 768
pipe['bert_sentence@labse'].setCaseSensitive(False)  | Info: whether to ignore case in tokens for embeddings matching | Currently set to : False
pipe['bert_sentence@labse'].setStorageRef('labse')   | Info: unique reference name for identification | Currently set to : labse
>>> pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'] has settable params:
pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setExplodeSentences(False)  | Info: whether to explode each sentence into a different row, for better parallelization. Defaults to false. | Currently set to : False
pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setStorageRef('SentenceDetectorDLModel_c83c27f46b97')  | Info: storage unique identifier | Currently set to : SentenceDetectorDLModel_c83c27f46b97
pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setEncoder(com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@7f47d7d6)  | Info: Data encoder | Currently set to : com.johnsnowlabs.nlp.annotators.sentence_detector_dl.SentenceDetectorDLEncoder@7f47d7d6
pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setImpossiblePenultimates(['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e'])  | Info: Impossible penultimates | Currently set to : ['Bros', 'No', 'al', 'vs', 'etc', 'Fig', 'Dr', 'Prof', 'PhD', 'MD', 'Co', 'Corp', 'Inc', 'bros', 'VS', 'Vs', 'ETC', 'fig', 'dr', 'prof', 'PHD', 'phd', 'md', 'co', 'corp', 'inc', 'Jan', 'Feb', 'Mar', 'Apr', 'Jul', 'Aug', 'Sep', 'Sept', 'Oct', 'Nov', 'Dec', 'St', 'st', 'AM', 'PM', 'am', 'pm', 'e.g', 'f.e', 'i.e']
pipe['deep_sentence_detector@SentenceDetectorDLModel_c83c27f46b97'].setModelArchitecture('cnn')  | Info: Model architecture (CNN) | Currently set to : cnn
>>> pipe['document_assembler'] has settable params:
pipe['document_assembler'].setCleanupMode('shrink')  | Info: possible values: disabled, inplace, inplace_full, shrink, shrink_full, each, each_full, delete_full | Currently set to : shrink

With pipe['bert_sentence@labse'].setBatchSize(8) | Info: Size of every batch | Currently set to : 8

Should fix your problem.

Let me know if it helps

C-K-Loan avatar Jul 17 '21 05:07 C-K-Loan