[Feature request] Logging Level and Progress Bar for Model Downloads
Right now, the output can be quite lengthy and verbose, see:
Could it be possible to expose logger options to granularly control the output as well as offering visual feedback for the status of the model download from Hugging Face?
I agree - that would be a great improvement! It was not originally considered since the library was designed for browsers, but since there has been a lot of interest for Node.js-like environments, it's definitely something to consider.
I think this would be a good first issue for someone who wants to contribute :)
That is exactly my use case, that would be for a Node application. And as such, I need a way to provide some visual feedback on the different operations, something similar in essence to:
- https://huggingface.co/docs/transformers/main_classes/logging
The screenshot log comes from WASM and you have to specify the log verbosity level in transformers.js/src/models.js, you can overwrite the constructSession function e.g. with this:
async function constructSession(pretrained_model_name_or_path, fileName, options) {
// TODO add option for user to force specify their desired execution provider
let modelFileName = `onnx/${fileName}${options.quantized ? '_quantized' : ''}.onnx`;
let buffer = await getModelFile(pretrained_model_name_or_path, modelFileName, true, options);
/** @type {InferenceSession.SessionOptions} */
const extraSessionOptions = {
logVerbosityLevel: 4,
logSeverityLevel: 4,
}
try {
return await InferenceSession.create(buffer, {
executionProviders,
...extraSessionOptions
});
} catch (err) {
// If the execution provided was only wasm, throw the error
if (executionProviders.length === 1 && executionProviders[0] === 'wasm') {
throw err;
}
console.warn(err);
console.warn(
'Something went wrong during model construction (most likely a missing operation). ' +
'Using `wasm` as a fallback. '
)
return await InferenceSession.create(buffer, {
executionProviders: ['wasm'],
...extraSessionOptions
});
}
}
Requires a bit discussion how the PR should then expose the session options to the transformers.js users :thinking:
A global env.logLevel should be sufficient, right?
I kinda lend to extending the options in pipeline(), because I would rather minimize the amount of global env state/variables :thinking:
https://github.com/microsoft/onnxruntime/blob/18f17c555d51caee83f15983c2620f463fbaddd1/js/common/lib/inference-session.ts#L126-L139
We could just make it a passthrough-options-ONNX-session-options-object, in the case that we need to use other session options later aswell, inside the existing pipeline options object.
Something like:
/**
* Utility factory method to build a [`Pipeline`] object.
*
* @param {string} task The task of the pipeline.
* @param {string} [model=null] The name of the pre-trained model to use. If not specified, the default model for the task will be used.
* @param {PretrainedOptions} [options] Optional parameters for the pipeline.
* @returns {Promise<Pipeline>} A Pipeline object for the specified task.
* @throws {Error} If an unsupported pipeline is requested.
*/
export async function pipeline(
task,
model = null,
{
quantized = true,
progress_callback = null,
config = null,
cache_dir = null,
local_files_only = false,
revision = 'main',
ortExtraSessionOptions = {},
} = {}
) {
But we could make the API easier to use aswell (e.g. using "verbose" instead of 0 because no one wants to memorize enums):
Do we even need both options, logVerbosityLevel and logSeverityLevel? It's not clear to me yet what the difference is :see_no_evil:
I have a slight feeling that other dev's are also a bit confused:
Do we even need both options, logVerbosityLevel and logSeverityLevel? It's not clear to me yet what the difference is 🙈
My thought exactly. This is why a global logging level might be okay. It will also be better for when we add more backends (not just onnx). If someone REALLY wants those log levels, they can set them manually with env.backends.onnx.*
We could just make it a passthrough-options-ONNX-session-options-object, in the case that we need to use other session options later aswell, inside the existing pipeline options object.
I'm not too keen on adding that to the pipeline function, just because it's not something users will modify often. It will also then have to be used in AutoModel and similar locations.
For people looking for the final answer / code. adding a nodejs code snippet here to set log level to 3 .
warning levels
VERBOSE = 0,
INFO = 1,
WARNING = 2,
ERROR = 3,
FATAL = 4
your code
import { pipeline, env } from '@xenova/transformers';
import fs from 'fs';
env.cacheDir = './.cache';
env.backends.onnx.logLevelInternal = 'error' // this line here
Is there anything else to work on this feature? I would like to contribute but it is not clear to me if this needs further work. Thank you!
@fcuenya we can configure the env.backends.onnx.logLevel of onnx, which should already control a lot of logging. but transformers.js also contains a few places, where there are logs using console.error or console.warn. controlling them using env would be the thing missing here as far as i understand.