Matthias Reso comments

Results 82 comments of


                                            Matthias Reso

trafficstars

Make a copy of the torchtext utils to remove dependency

We should also consider removing https://github.com/pytorch/serve/tree/master/examples/text_classification_with_scriptable_tokenizer as it depends on torchtext as well.

Load model failed - error: Worker died

Hi @geraldstanje Seems like you're trying to deploy a model using a [Sagemaker example](https://github.com/aws/amazon-sagemaker-examples/blob/main/frameworks/pytorch/code/inference.py). SageMake uses TorchServe for model deployment but the model artifact you're creating can not directly be...

Load model failed - error: Worker died

@geraldstanje yes, you basically follow the XGBoost example to create your own handler or if your model is a HuggingFace model from their transformers library you can just follow one...

CUDA out of Memory with low Memory Utilization (CUDA error: device-side assert triggered)

Hi @emilwallner, thanks for the extensive issue report. My thought on this are: 1. You're looking at the server after the crash, right? Meaning that the worker process has died,...

CUDA out of Memory with low Memory Utilization (CUDA error: device-side assert triggered)

Yeah, performance will suffer significant from CUDA_LAUNCH_BLOCKING as kernels will not run asynchronously. So only activate if really necessary for debugging. You could try to run the model in a...

Exception when using torchserve to deploy hugging face model: java.lang.InterruptedException: null

Hi @yolk-pie-L I was not able to reproduce this with the 0.9.0 docker and the Error log is inconclusive. We just released 0.10.0, could you retry with the new version?

Exception when using torchserve to deploy hugging face model: java.lang.InterruptedException: null

@lxning do you have any idea what could cause the java.lang.InterruptedException: null?

Broken Link to the debug-torchserve-backend page

Thanks for bringing this up, we will fix the link. In case you have not found it yet, you can access the doc here: https://github.com/pytorch/serve/blob/master/examples/image_classifier/resnet_18/README.md#debug-torchserve-backend

Analysis of loss spikes in LLaMA pretrain

Moving this to meta-llama/llama as this touched the original paper.

IImportError: cannot import name 'prepare_model_for_int8_training' from 'peft' (/usr/local/lib/python3.10/dist-packages/peft/init.py)

Hi, seems like you're using an old llama-recipes version (pypi releases are sadly lagging behind quite a bit) as we've switched to prepare_model_for_kbit_training some time ago https://github.com/meta-llama/llama-recipes/blob/fb7dd3a3270031e407338027e3f6fbea2b8e431e/src/llama_recipes/finetuning.py#L11 Please update llama-recipes...