Vito Plantamura

Results 150 comments of Vito Plantamura

hi @vmobilis , I just committed the support for batched inputs, i.e. a `--num` option that allows to specify the number of images to generate. On one of my computers,...

hi, OnnxStream is probably already capable of running the models you mentioned. The problem is converting the code that "calls" these models into C++ (for example, in the case of...

hi, currently the LLM sample application only supports "TinyLlama-1.1B-Chat-v0.3-fp16" and "Mistral-7B-Instruct-v0.2-fp16". Vito

Since TinyLlama adopts the same architecture and tokenizer as Llama 2, adding Llama 2 support to src/llm.cpp should be fairly simple. It involves exporting the onnx file, running "onnxsim_large_model" on...

I will try to reproduce the problem and let you know in the next few days. This problem is typically caused by the fact that the implementation of the HF...

I was able to run src/llm.cpp with llama2 exported using your script. The problem is that your script preserves the upcasts (float16->float32) and downcasts (float32->float16) needed in certain parts of...

hi, I think orchestrating an inference job using a shell script might be possible, but it's not at all the ideal choice 😀 In any case, the type of parallelization...

no, absolutely nothing special: torch.onnx.export + onnxsim_large_model + onnx2txt (in this order). Can you share the model you are trying to convert and especially the code that calls torch.onnx.export? Vito

I found the code I originally used to export the VAE model: ``` pipe = StableDiffusionPipeline.from_pretrained("CompVis/stable-diffusion-v1-4") class VAED(nn.Module): def __init__(self, vae): super(VAED, self).__init__() self.vae = vae def forward(self, latents): self.vae.enable_slicing()...

I think the first thing to do to try to understand the reason is to compare the two model.txt... specifically searching for different, missing or extra operations at the beginning...