LLaDA icon indicating copy to clipboard operation
LLaDA copied to clipboard

I made you a GUI with quantization and CPU options.

Open angrysky56 opened this issue 9 months ago • 9 comments

https://github.com/angrysky56/llada_gui

angrysky56 avatar Mar 02 '25 04:03 angrysky56

Thanks for your work~ We will try it.

yyyouy avatar Mar 03 '25 01:03 yyyouy

later

jelspace avatar Mar 03 '25 03:03 jelspace

I am not a coder but trying to set up some basic improvements. I tried to convert to onnx but my pc is too weak so far to convert let alone test. Got it about 10x faster and able to run on 12gb nvidia gpu.

Image

angrysky56 avatar Mar 03 '25 23:03 angrysky56

The diffusion process in LLaDA offers unique opportunities for interaction that traditional autoregressive LLMs don't have. Some novel ways we could interact with and guide the diffusion process:

Novel Interactions with LLaDA's Diffusion Process

1. Guided Diffusion with External LLM Feedback

Create a system where Claude or another LLM provides real-time guidance during the diffusion process:

  • At specific step intervals (e.g., every 20 steps), pause the diffusion
  • Show the partially completed tokens to Claude
  • Claude provides guidance on which tokens seem most promising
  • Use this feedback to adjust token confidence scores before continuing

2. Multi-Model Token Verification

Implement a token verification pipeline:

  • LLaDA proposes candidate tokens through diffusion
  • Other models (like different size LLMs) vote on token quality
  • Create an ensemble approach where tokens need consensus to be unmasked
  • This could significantly improve accuracy by combining diffusion's creativity with autoregressive models' precision

3. Semantic Steering with Embeddings

Implement semantic guidance during diffusion:

  • Define target semantic directions using embedding models
  • As tokens are generated, compute their embeddings
  • Apply gentle forces to guide the generation toward desired semantic spaces
  • This could allow "steering" the response toward certain topics or styles

4. Interactive Diffusion Interface

Create an interface allowing direct human interaction during the diffusion process:

  • Visualize tokens as they're being generated
  • Allow users to "lock" good tokens they want to keep
  • Let users "reject" tokens they don't want
  • Enable users to provide hints for masked areas
  • This turns generation into a collaborative process

5. Adaptive Masking Strategies

Implement more sophisticated masking strategies:

  • Use contextual importance to determine mask scheduling
  • Key structural tokens (like those in subject positions) remain masked longer
  • Content tokens get resolved earlier
  • This could improve coherence significantly

6. Cross-Modal Guidance

Use vision models to guide text diffusion:

  • For topics with visual components, use image models to verify coherence
  • Example: If generating text about "a red car on a bridge," verify that the combined tokens create embeddings similar to images of red cars on bridges
  • This creates a subtle alignment between text and visual worlds

7. LoRA-Based Diffusion Control

Implement specialized LoRAs that work directly with the diffusion process:

  • Train small LoRA adapters specifically for controlling diffusion dynamics
  • These could provide stylistic control, domain-specific knowledge, etc.
  • The LoRAs would directly influence which tokens get unmasked when

8. Dynamic Confidence Thresholding

Rather than using a fixed confidence threshold:

  • Learn optimal unmasking thresholds for different token positions and contexts
  • Early tokens might need higher confidence
  • Later tokens could use lower thresholds once context is established
  • This adaptively manages the diffusion process based on position

9. Memory-Augmented Diffusion

Integrate an external memory system:

  • Store important context and knowledge in a vector database
  • During diffusion, query this memory to guide token selection
  • This adds long-context capabilities to LLaDA
  • Perfect for document-grounded or knowledge-intensive tasks

10. Prototype Implementation

For a first implementation, I'd suggest starting with 1 or 4:

  • For 1 (LLM guidance): Create a simple API connection between LLaDA and Claude
  • For 4 (interactive interface): Extend the visualization tab to allow clicking on tokens

angrysky56 avatar Mar 04 '25 06:03 angrysky56

for 1. 2 is it possible to use other llada runned on second gpu/cpu,itselfs to check itselfs,also for errors and to turn back and re again(errors repair and quality up,if this was the logic here?)

Self-correcting technology :)

also option to fix its own weights to prevent founded errors

Self-quality improving technology.

jelspace avatar Mar 04 '25 07:03 jelspace

it is very good if we have into gui quality tests gsm8... others...

jelspace avatar Mar 04 '25 08:03 jelspace

CPU is much slower than GPU. I am not sure how the AI set up the GPU offloading but it is working pretty good. Possibly LoRA or much smaller LLaDA models could be run on CPU in tandem effectively on limited PC's.

I will see what I can do with this. https://github.com/EleutherAI/lm-evaluation-harness

angrysky56 avatar Mar 04 '25 18:03 angrysky56

Got sidetracked, I may have created "cognitive diffusion" with a vector db, but currently it is not hooked in to the process except as a demo. Integrating something like this: https://github.com/synthience/mcp-titan-cognitive-memory

edit: Cognitive Diffusion memory is now integrated, working though slower and more memory intensive. Posted to the repo, sorry the repo is a mess.

Image

angrysky56 avatar Mar 05 '25 05:03 angrysky56

new version, was supposed to be a clean up, but now the vector db integration works to a degree at least. https://github.com/angrysky56/llada_gui_new

Edit: Prototype training now available to work on!

angrysky56 avatar Mar 07 '25 08:03 angrysky56