GPULlama3.java icon indicating copy to clipboard operation
GPULlama3.java copied to clipboard

Add GUI Chatbox for GPULlama3.java Inference

Open svntax opened this issue 6 months ago • 7 comments

This PR adds a new JavaFX GUI for running inference with GPULlama3 (for issue #24).

It adds a new package com.example.gui containing all the new classes for the chatbox GUI, following a Model-View-Controller-Interactor framework.

Key Features

  • Dropdown menu to select an engine (TornadoVM, JVM).
  • Browse button to select the directory to the user's GPULlama3.java install.
  • Dropdown menu and Reload button to search for models inside of a /models folder in the user's GPULlama3.java directory.
  • Prompt text field for user input.
  • Run button to trigger inference by running llama-tornado as a new process.
  • Output area to display responses and other logs.

How to Run

After following the "Install, Build, and Run" instructions from the README, run the following:

mvn javafx:run

Notes

  • Dark theme styling is from AtlantaFX.
  • The right panel of the GUI is unfinished, but contains the System Monitoring panel with checkboxes (that do nothing), and an empty text area where the monitoring terminals would display.
  • I'm using Windows, so Linux / macOS is untested.

Next Steps

  • I can try to add the system monitoring features, although I'm not sure how far I'll get since I'm on Windows, so I can't test htop, nvtop or any of the Linux-specific options as far as I know.
  • Is the system monitoring display supposed to be embedded terminals? I did a bit of searching and found projects like TerminalFX and JediTermFX for this, but I don't know if that's the best way to implement this.

svntax avatar Jun 25 '25 09:06 svntax

CLA assistant check
All committers have signed the CLA.

CLAassistant avatar Jun 25 '25 09:06 CLAassistant

Thank you @svntax it worked just fine on my setup!

mikepapadim avatar Jun 25 '25 10:06 mikepapadim

I'll work on these changes soon.

svntax avatar Jun 26 '25 01:06 svntax

I'm a bit stuck on how to run the application directly from the GUI. I have it working for instruct mode, but interactive mode currently reads input from the command line in a loop (in Model's runInteractive() method). Would we need a new method or changes to Model for interactive mode to work with the GUI? Or is there some other better approach?

Two other things I'd like feedback on:

  • To run models from the GUI, I'm using LlamaApp's methods loadModel() and createSampler() from within ChatboxInteractor, so is it okay to make them public?
  • Using TornadoVM requires LlamaApp.USE_TORNADOVM to read from config flags, but if we want the GUI to let the user choose, how should this change when right now it's a constant?

svntax avatar Jun 28 '25 08:06 svntax

I'm a bit stuck on how to run the application directly from the GUI. I have it working for instruct mode, but interactive mode currently reads input from the command line in a loop (in Model's runInteractive() method). Would we need a new method or changes to Model for interactive mode to work with the GUI? Or is there some other better approach?

Two other things I'd like feedback on:

  • To run models from the GUI, I'm using LlamaApp's methods loadModel() and createSampler() from within ChatboxInteractor, so is it okay to make them public?
  • Using TornadoVM requires LlamaApp.USE_TORNADOVM to read from config flags, but if we want the GUI to let the user choose, how should this change when right now it's a constant?

-Handling Interactive Mode with the GUI Modifying runInteractive() to work with the GUI isn't ideal because its while loop is designed for a command-line interface (CLI).

A cleaner approach is to create a new method in the Model class specifically for the GUI. This method would take a single user input (as a string) and return the model's single response. This way, the GUI can manage the "loop" itself—calling this new method every time the user sends a message.

-Making loadModel() and createSampler() Public Yes, making loadModel() and createSampler() in LlamaApp public is a perfectly reasonable solution.

The ChatboxInteractor is acting as a controller or intermediary between your GUI and the core application logic in LlamaApp. For the GUI to be able to trigger the model loading and sampler creation process, the methods that perform these actions must be accessible to it. Encapsulation is important, but in this case, you are intentionally exposing specific functionalities to the GUI layer, which is a standard and necessary practice.

-Managing the USE_TORNADOVM Flag You're right; a static final constant won't work if you want the user to be able to change this setting from the GUI.

The best way to handle this is to change USE_TORNADOVM from a constant to a regular member variable within a configuration object or directly in LlamaApp.

Remove final: Change public static final boolean USE_TORNADOVM to something like private boolean useTornadoVM.

Add a Setter Method: Create a public method to change its value, for example, public void setUseTornadoVM(boolean value).

GUI Integration: Your GUI's checkbox or toggle can now call this setter method before it calls loadModel().

This way, the user's choice is set first, and then the model is loaded with the correct configuration. The setting is no longer a compile-time constant but a runtime configuration option.

mikepapadim avatar Jul 01 '25 12:07 mikepapadim

I finally have the GUI working directly with the main application, but there's a memory leak when using TornadoVM. I'm not sure if there's a problem with the approach I have, or if I'm not freeing up resources correctly, or if it's something else.

How to Run

Use the new --gui flag in llama-tornado to launch the GUI.

Windows example:

python llama-tornado --gui

How it works

I copied how LlamaApp runs by creating a Model and Sampler when starting a new chat. Instruct mode uses runInstructOnce(), and interactive mode uses the new method in Model runInteractiveStep(), which returns a Response object (a record in Model) with data for the ongoing conversation (state, conversation tokens, and a TornadoVMMasterPlan if applicable). These are reused every time the user sends a message in interactive mode, until "quit" or "exit" is sent, which then ends the chat session and should free resources.

Memory leak problem

I've tested with Llama-3.2-1B-Instruct-Q8_0.gguf, and I can confirm that in both instruct and interactive mode, if using TornadoVM, the model does load correctly (about 3-4 GB to VRAM), but when I try to free resources with freeTornadoExecutionPlan(), it frees up around 500 MB only.

Since the inference is happening in a background thread, I thought maybe it has to happen in the main JavaFX thread, but that didn't help.

The problem happens only with TornadoVM. You have to close the GUI for the memory to finally be freed. Also, there's no memory leak as far as I can tell if running on CPU (selecting JVM for the engine).

svntax avatar Jul 08 '25 00:07 svntax

@svntax thank you for contributing the GUI, let me try it and see if I can find the root cause of the memory leak

mikepapadim avatar Jul 08 '25 11:07 mikepapadim