Allignment between PIRs from simulation and EnvDiff

Open barrydoooit opened this issue 11 months ago • 1 comments

Hi Authors, thanks for the fancy work and clear demonstration of your framework. Just curious for the input arguments for the environment diffusion module: Does it only take text prompt as input?

Specifically, according to your fig 5 in the paper, the EnvDiff generates a PIR that has the same spatial information as the RGB image. However, the EnvDiff is claimed to be a text-to-image model, which means the generated PIR will belong to a random scene rather than the object/scene/env generated by the ObjDiff. Then, how can this PIR be alligned with the PIR generated by the simulation, which is based on the output of ObjDiff?

Feb 05 '25 21:02 barrydoooit