diffusers.js icon indicating copy to clipboard operation
diffusers.js copied to clipboard

Infinite Prompt Length Feature

Open jdp8 opened this issue 10 months ago • 0 comments

Problem

Currently, there is a limit to the number of tokens that can be passed to the CLIP Text Encoder (usually 77 tokens) as explained here. If an input prompt should contain more than the maximum token length, the following error will be shown:

image

Solution

In order to overcome this limit and take longer prompts, AUTOMATIC1111 has this solution which consists of breaking the prompt tokens into chunks, encoding each chunk, and concatenating the encoded chunks in a Tensor before passing it to the UNET model. Here is another useful explanation of the solution.

One important detail is that in order to achieve this, I had to make sure the token lengths of the prompt and negative prompt were the same, otherwise, there would be an error when concatenating the Tensors. There is no need to break the prompt in chunks if the tokens length doesn't exceed the Tokenizer model max length.

Long Prompt Results

Before this change, the following long prompts would fail, but now they produce the following images (generated with the LCM Pipeline):

  • inspired by realflow-cinema4d editor features, create image of a transparent luxury cup with ice fruits and mint, connected with white, yellow and pink cream, Slow - High Speed MO Photography, 4K Commercial Food, YouTube Video Screenshot, Abstract Clay, Transparent Cup , molecular gastronomy, wheel, 3D fluid,Simulation rendering, still video, 4k polymer clay futras photography, very surreal, Houdini Fluid Simulation, hyperrealistic CGI and FLUIDS & MULTIPHYSICS SIMULATION effect, with Somali Stain Lurex, Metallic Jacquard, Gold Thread, Mulberry Silk, Toub Saree, Warm background, a fantastic image worthy of an award.

inspired

  • fantasy medieval village world inside a glass sphere , high detail, fantasy, realistic, light effect, hyper detail, volumetric lighting, cinematic, macro, depth of field, blur, red light and clouds from the back, highly detailed epic cinematic concept art cg render made in maya, blender and photoshop, octane render, excellent composition, dynamic dramatic cinematic lighting, aesthetic, very inspirational, world inside a glass sphere by james gurney by artgerm with james jean, joe fenton and tristan eaton by ross tran, lora:epinoiseoffset_v2:0.35, fine details, 4k resolution, lora:add_detail:0.25

fantasy

Other

  • Fixed a typo in the repo name of the LCM Dreamshaper FP16 model.
  • I noticed that the negative prompt is not used in the LCM Pipeline. Not sure if this is the intended usage, but wanted to mention it just in case.

jdp8 avatar Apr 25 '24 17:04 jdp8