ComfyUI VAE decoding is mapping information from latent space to pixel space, can it somehow convert directly into bytecode?

trafficstars

VAE decoding is mapping information from latent space to pixel space, can it somehow convert directly into bytecode? If not, is there any node or custom node that i can use to convert it directly to bytecode?

Feb 08 '24 03:02 MaxTran96

What do you mean by "bytecode" in your context?

Feb 08 '24 04:02 ltdrdata

just the base64 encoded data of png

Feb 10 '24 19:02 MaxTran96

sorry i mean the string base64 of the bytecode of the image

Feb 10 '24 19:02 MaxTran96

sorry i mean the string base64 of the bytecode of the image

Does it mean you want to convert the VAE decoded image into a base64 string?

https://github.com/lldacing/comfyui-easyapi-nodes

Feb 11 '24 02:02 ltdrdata

I think he wants it to convert directly into base64 of a png without the intermediate step of the binary png which seems technically possible but questionably useful. You'd need to intercept whichever PNG writer is being used and make it encode base64 as it generates the PNG. There's some vague possibility either pillow or CV2 already has something to do this but it seems more like a bizarre optimization you'd find in some gigantic web framework written in a slow language (not python slow, I'm talking java slow) that was probably removed from the codebase 15-20 years ago when web servers all had fast enough processors and large enough amounts of ram that it was more trouble trying to decipher what it was doing than it was worth. :P

@ltdrdata The nodes you linked accept incoming images as raw tensors but still run them through Pillow first then convert the pngs to base64 which I think he's trying to avoid.

@MaxTran96 Bytecode has a specific meaning and is only used to refer to intermediate languages used to allow easier creation of front-end languages and backend code lowering for compilers (e.g. LLVM-IR) or as the stored format of programs meant to be run as an optimization for an interpreted language (CPython .pyc files) or a JIT (Java), or a Hybrid like .NET (and I think Dalvik's version of Java) where programs start as bytecode but the background SxS process in the case of Windows ends up lowering them to per-machine optimized native binaries when the system isn't in use.

Regular PNG would just be called "binary" or just "a PNG" since it's understood that the format isn't anything else. Something like SVG is more ambiguous since SVGs are really XMLs that can be modified in a text editor.
(Sorry I ramble)

Feb 12 '24 14:02 NeedsMoar

ah yea sorry for using the wrong term of bytecode here, i meant to say the base64 string of the image instead of the actual png file. But yes thanks for chiming in. It seems like the best way to do this is to just encode the png file to base64. I was thinking if theres a node like VAEDecode but the output of that node is a base64 string of the image. Speaking of that, what is the actual format of the output of VAEDecode? I understand VAEDecode map the information back to pixel space but what format is it in?

Feb 13 '24 08:02 MaxTran96

ah yea sorry for using the wrong term of bytecode here, i meant to say the base64 string of the image instead of the actual png file. But yes thanks for chiming in. It seems like the best way to do this is to just encode the png file to base64. I was thinking if theres a node like VAEDecode but the output of that node is a base64 string of the image. Speaking of that, what is the actual format of the output of VAEDecode? I understand VAEDecode map the information back to pixel space but what format is it in?

IMAGE in ComfyUI is NHWC torch tensor (4d)

Feb 13 '24 12:02 ltdrdata

thanks! What would be the tradeoff between first converting NHWC to NCHW, then to PIL image, then serialize the PIL image to a buffer and base64 encode it versus serializing the raw raw NHWC tensor and base64 encode it?

Feb 13 '24 21:02 MaxTran96

I found some example codes of 2 different methods mentioned above 1st approach

import torch
import base64
import io
from torchvision.transforms import ToPILImage

# Example 4D NHWC tensor (batch size 1 for simplicity)
# Assuming tensor dtype is float and range [0, 1] for image data
tensor_nhwc = torch.rand(1, 64, 64, 3)  # Example tensor

# Convert NHWC to NCHW for PyTorch image processing
tensor_nchw = tensor_nhwc.permute(0, 3, 1, 2)

# Convert tensor to PIL Image (for one image in the batch)
image = ToPILImage()(tensor_nchw.squeeze(0))

# Serialize the PIL Image to a buffer
buffer = io.BytesIO()
image.save(buffer, format="JPEG")  # You can change format as needed

# Encode the buffer to a Base64 string
base64_str = base64.b64encode(buffer.getvalue()).decode("utf-8")

2nd approach:

import torch
import base64
import io

# Example 4D tensor (batch size 1 for simplicity, with dummy data)
tensor_nhwc = torch.rand(1, 64, 64, 3)  # Random tensor for demonstration

# Serialize the tensor to a buffer
buffer = io.BytesIO()
torch.save(tensor_nhwc, buffer)

# Encode the buffer's content to Base64
base64_str = base64.b64encode(buffer.getvalue()).decode('utf-8')

Feb 13 '24 21:02 MaxTran96

thanks! What would be the tradeoff between first converting NHWC to NCHW, then to PIL image, then serialize the PIL image to a buffer and base64 encode it versus serializing the raw raw NHWC tensor and base64 encode it?

If you directly encode a raw tensor to base64, the size will be significantly large.

Feb 14 '24 14:02 ltdrdata

ComfyUI ComfyUI copied to clipboard

VAE decoding is mapping information from latent space to pixel space, can it somehow convert directly into bytecode?

ComfyUI
ComfyUI copied to clipboard