babylon icon indicating copy to clipboard operation
babylon copied to clipboard

1D Matrix Multiplication example for HAT

Open jjfumero opened this issue 1 year ago • 19 comments

Add new example for 1D Matrix Multiplication in HAT.

How to test?

## Compile 
java --add-modules jdk.incubator.code --enable-preview --source 24 bld

## Run with the OpenCL Backend
java @bldr/hatrun ffi-opencl matmul  

## Run with the CUDA Backend
java @bldr/hatrun ffi-ptx matmul 

Note that the generated kernel for OpenCL contains a race condition:

__kernel void matrixMultiplyKernel(
    __global KernelContext_t *kc, __global F32Array_t* matrixA, __global F32Array_t* matrixB, __global F32Array_t* matrixC, int size
){
    kc->x=get_global_id(0);                   //  << Shared struct across all threads to store the thread-id 
    if(kc->x<kc->maxX){
        for(int j = 0; j<size; j=j+1){
            float acc = (float)0;
            for(int k = 0; k<size; k=k+1){
                acc=acc+matrixA->array[(long)(kc->x*size+k)]*matrixB->array[(long)(k*size+j)];
            }
            matrixC->array[(long)(kc->x*size+j)]=acc;
        }
    }
    return;
}

After applying a patch provided by Gary Frost to solve the race condition, it works.

Patch:

diff --git a/hat/hat/src/main/java/hat/backend/c99codebuilders/C99HatKernelBuilder.java b/hat/hat/src/main/java/hat/backend/c99codebuilders/C99HatKernelBuilder.java
index ade90914d7e..2719fed31ed 100644
--- a/hat/hat/src/main/java/hat/backend/c99codebuilders/C99HatKernelBuilder.java
+++ b/hat/hat/src/main/java/hat/backend/c99codebuilders/C99HatKernelBuilder.java
@@ -26,7 +26,6 @@
 
 
 import hat.buffer.Buffer;
-import hat.buffer.KernelContext;
 import hat.callgraph.KernelCallGraph;
 import hat.callgraph.KernelEntrypoint;
 import hat.optools.FuncOpWrapper;
@@ -72,9 +71,13 @@ T typedefStructOrUnion(boolean isStruct, String name, Consumer<T> consumer) {
 
 
     public final T scope() {
-        return
-                identifier("kc").rarrow().identifier("x").equals().globalId().semicolon().nl();
-                //.identifier("kc").rarrow().identifier("maxX").equals().globalSize().semicolon().nl();
+
+        identifier("KernelContext_t").space().identifier("mine").semicolon().nl();
+        identifier("KernelContext_t").asterisk().space().identifier("kc").equals().ampersand().identifier("mine").semicolon().nl();
+        identifier("kc").rarrow().identifier("x").equals().globalId().semicolon().nl();
+        identifier("kc").rarrow().identifier("maxX").equals().identifier("global_kc").rarrow().identifier("maxX").semicolon().nl();
+        return self();
+
     }
 
     public abstract T globalPtrPrefix();
@@ -137,7 +140,7 @@ public T kernelEntrypoint(KernelEntrypoint kernelEntrypoint, Object[] args) {
                 }
             }
             parenNlIndented(_ -> {
-                        globalPtrPrefix().space().suffix_t("KernelContext").space().asterisk().identifier("kc");
+                        globalPtrPrefix().space().suffix_t("KernelContext").space().asterisk().identifier("global_kc");
                         list.stream().skip(1).forEach(info ->
                                 comma().space().type(info.javaType).space().varName(info.varOp)
                         );

Note: this PR does not provide this path, only the example and the runner extension to run the matrix multiplication.


Progress

  • [ ] Change must not contain extraneous whitespace

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.org/babylon.git pull/276/head:pull/276
$ git checkout pull/276

Update a local copy of the PR:
$ git checkout pull/276
$ git pull https://git.openjdk.org/babylon.git pull/276/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 276

View PR using the GUI difftool:
$ git pr show -t 276

Using diff file

Download this PR as a diff file:
https://git.openjdk.org/babylon/pull/276.diff

Using Webrev

Link to Webrev Comment

jjfumero avatar Nov 19 '24 10:11 jjfumero

Hi @jjfumero, welcome to this OpenJDK project and thanks for contributing!

We do not recognize you as Contributor and need to ensure you have signed the Oracle Contributor Agreement (OCA). If you have not signed the OCA, please follow the instructions. Please fill in your GitHub username in the "Username" field of the application. Once you have signed the OCA, please let us know by writing /signed in a comment in this pull request.

If you already are an OpenJDK Author, Committer or Reviewer, please click here to open a new issue so that we can record that fact. Please use "Add GitHub user jjfumero" as summary for the issue.

If you are contributing this work on behalf of your employer and your employer has signed the OCA, please let us know by writing /covered in a comment in this pull request.

bridgekeeper[bot] avatar Nov 19 '24 10:11 bridgekeeper[bot]

@jjfumero This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

1D Matrix Multiplication example for HAT

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been no new commits pushed to the code-reflection branch. If another commit should be pushed before you perform the /integrate command, your PR will be automatically rebased. If you prefer to avoid any potential automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

openjdk[bot] avatar Nov 19 '24 10:11 openjdk[bot]

/signed

jjfumero avatar Nov 19 '24 10:11 jjfumero

Thank you! Please allow for up to two weeks to process your OCA, although it is usually done within one to two business days. Also, please note that pull requests that are pending an OCA check will not usually be evaluated, so your patience is appreciated!

bridgekeeper[bot] avatar Nov 19 '24 10:11 bridgekeeper[bot]

@jjfumero This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Dec 17 '24 15:12 bridgekeeper[bot]

Still waiting for the OCA approval.

jjfumero avatar Dec 18 '24 08:12 jjfumero

@jjfumero This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Jan 15 '25 14:01 bridgekeeper[bot]

Still waiting for the OCA approval

jjfumero avatar Jan 15 '25 14:01 jjfumero

You might want to contact [email protected] regarding your OCA approval status. Not sure what's up there, but it shouldn't take that long.

SirYwell avatar Jan 15 '25 15:01 SirYwell

Webrevs

mlbridge[bot] avatar Jan 17 '25 15:01 mlbridge[bot]

@jjfumero This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Feb 12 '25 17:02 bridgekeeper[bot]

Pending for review

jjfumero avatar Feb 13 '25 06:02 jjfumero

@jjfumero This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Mar 13 '25 11:03 bridgekeeper[bot]

@jjfumero Hi, can use Babylon to implement operations to do llama inference now? Babylon has all the basic ops for at least one platform ie. cuda now?

SidneyLann avatar Apr 07 '25 15:04 SidneyLann

Hi @SidneyLann , I am not the core maintainer of Babylon. Probably Gary Frost can help you with your questions. From my view, I think you need to access shared memory and some synchronisation primitives to be able to perform reductions. I am not sure if this is implemented in HAT yet.

jjfumero avatar Apr 07 '25 16:04 jjfumero

Hi Sidney

As Juan mentioned, I don't think we are there yet. But we have plans.

We need to add low level primitives to HAT for matrix mul/scans etc to allow us to handoff to the vendor backends (GPU drivers) without unnecessary copies.

Gary

On Mon, Apr 7, 2025 at 5:20 PM Juan Fumero @.***> wrote:

Hi @SidneyLann https://github.com/SidneyLann , I am not the core maintainer of Babylon. Probably Gary Frost can help you with your questions. From my view, I think you need to access shared memory and some synchronisation primitives to be able to perform reductions. I am not sure if this is implemented in HAT yet.

— Reply to this email directly, view it on GitHub https://github.com/openjdk/babylon/pull/276#issuecomment-2783916596, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBKEN35FIKYUTUIYRDYC5D2YKQWRAVCNFSM6AAAAABSBW7YWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOBTHEYTMNJZGY . You are receiving this because you are subscribed to this thread.Message ID: @.***> [image: jjfumero]jjfumero left a comment (openjdk/babylon#276) https://github.com/openjdk/babylon/pull/276#issuecomment-2783916596

Hi @SidneyLann https://github.com/SidneyLann , I am not the core maintainer of Babylon. Probably Gary Frost can help you with your questions. From my view, I think you need to access shared memory and some synchronisation primitives to be able to perform reductions. I am not sure if this is implemented in HAT yet.

— Reply to this email directly, view it on GitHub https://github.com/openjdk/babylon/pull/276#issuecomment-2783916596, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABBKEN35FIKYUTUIYRDYC5D2YKQWRAVCNFSM6AAAAABSBW7YWWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDOOBTHEYTMNJZGY . You are receiving this because you are subscribed to this thread.Message ID: @.***>

grfrost avatar Apr 07 '25 17:04 grfrost

@grfrost Hi Gray Are you develope many platforms(ptx,cuda,spirv,hip,etc) simultaneously?How about complete one platform(ie. cuda) first?

SidneyLann avatar Apr 07 '25 23:04 SidneyLann

@jjfumero this pull request can not be integrated into code-reflection due to one or more merge conflicts. To resolve these merge conflicts and update this pull request you can run the following commands in the local repository for your personal fork:

git checkout dev/examples
git fetch https://git.openjdk.org/babylon.git code-reflection
git merge FETCH_HEAD
# resolve conflicts and follow the instructions given by git merge
git commit -m "Merge code-reflection"
git push

openjdk[bot] avatar Apr 18 '25 19:04 openjdk[bot]

@jjfumero This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply add a new comment to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar May 06 '25 05:05 bridgekeeper[bot]

https://github.com/openjdk/valhalla/pull/1478#issuecomment-2926632410

@SidneyLann Valhalla is ready for experimental use, you can either build the project from source (build instructions can be found here) or you can grab a prebuilt package here. Please give it a try and report to us any issue you find, it would be a great help in the stabilization of Valhalla.

If you want to know whether Valhalla can be released to mainline soon then the answer is we don't know and we are trying our best. I believe an act of trying, reporting issues, and even contributing will help Valhalla to land sooner.

@grfrost Hi Gray Is babylon waiting for valhalla ready? valhalla is ready for experimental use now, and also babylon ? Thank you.

SidneyLann avatar Jun 01 '25 07:06 SidneyLann

@SidneyLann

No Babylon is not waiting for Valhalla. We don't use it at present, but its possible that we might down the line.

grfrost avatar Jun 02 '25 12:06 grfrost

@SidneyLann Sorry just saw your Q above regarding 'why not finish CUDA version first?'

The reason we have multiple backends, at various stages of development is because we want to ensure that HAT can be implemented on the widest possible set of backends (CUDA/HIP/OpenCL/SPIRV), so we are building 'reference' implementations of each.

I am attempting to provide 'reference' (i.e. almost definitely not maximally performant :) ) multiple backends to make sure this is plausible, and to ensure the program model scales.

Our eventual hope is to persuade CUDA/OpenCL/HIP experts (maybe the vendor runtime owners themselves) to eventually help us build out more robust implementations.

OpenCL is probably more thouroughly tested and complete, just because I am more familiar with OpenCL.

grfrost avatar Jun 02 '25 12:06 grfrost

Conflicts solved. It works with the latest tip: https://github.com/openjdk/babylon/commit/5bdc8fffc3c29f4fb41125375b641dd9670bc468

jjfumero avatar Jun 11 '25 15:06 jjfumero

@jjfumero This pull request has been inactive for more than 4 weeks and will be automatically closed if another 4 weeks passes without any activity. To avoid this, simply issue a /touch or /keepalive command to the pull request. Feel free to ask for assistance if you need help with progressing this pull request towards integration!

bridgekeeper[bot] avatar Jul 09 '25 16:07 bridgekeeper[bot]

PR conflicts solved. It complies with the latest version: 97889b818ed.

@grfrost , shall we merge this?

jjfumero avatar Jul 21 '25 07:07 jjfumero

/sponsor

grfrost avatar Jul 21 '25 07:07 grfrost

@grfrost The change author (@jjfumero) must issue an integrate command before the integration can be sponsored.

openjdk[bot] avatar Jul 21 '25 07:07 openjdk[bot]

/integrate

jjfumero avatar Jul 21 '25 07:07 jjfumero

@jjfumero Your change (at version a56a0bb9c96bdb983996ebd9ae5546bbd28575f2) is now ready to be sponsored by a Committer.

openjdk[bot] avatar Jul 21 '25 07:07 openjdk[bot]

/sponsor

grfrost avatar Jul 21 '25 07:07 grfrost