Matt Smith

Results 18 comments of Matt Smith

@inducer I'm seeing an intermittent failure in the boxtree CI (here's a [failing run](https://github.com/inducer/pyopencl/actions/runs/8821071538/job/24216130852) and a [successful run](https://github.com/inducer/pyopencl/actions/runs/8821371427/job/24217075076) for the same code). Is this something to be concerned about?

In `build_global_storage_to_sweep_map` (called from `ArrayToBufferMap`, and ultimately from `precompute_for_single_kernel`), I'm seeing some strange(?) behavior in the call to `intersect_range` ([code](https://github.com/inducer/loopy/blob/e99fe74f549dc0f09c8919862ff3dd6c2f070f05/loopy/transform/array_buffer_map.py#L150)). The inputs/outputs look like this: ```python # Before call to...

@inducer I tried generalizing the `FusionContractorArrayContext` to handle loops without an element axis. Dealing with the loop splitting was straightforward; however, the [topo sorting](https://github.com/illinois-ceesd/meshmode/blob/31c702953d2d5967a78d73096a34aef5aacf9aaa/meshmode/array_context.py#L1087-L1102), [barrier inserting](https://github.com/illinois-ceesd/meshmode/blob/31c702953d2d5967a78d73096a34aef5aacf9aaa/meshmode/array_context.py#L1760-L1764), and [temporary aliasing](https://github.com/illinois-ceesd/meshmode/blob/31c702953d2d5967a78d73096a34aef5aacf9aaa/meshmode/array_context.py#L445-L552) steps...

Here's a breakdown of what's happening inside `Program.build`: ``` │ ├─ 11.333 Program.build pyopencl/__init__.py:505 │ │ └─ 11.333 Program._build_and_catch_errors pyopencl/__init__.py:554 │ │ └─ 11.333 pyopencl/__init__.py:536 │ │ └─ 11.333 create_built_program_from_source_cached...

Based on @matthiasdiener's comment and our discussion this morning, I made some more measurements, this time on the whole compile time. Specifically, I compared the first step time of grudge...

Seems like the time spent in `get_info(BINARIES)` is much higher for CPUs than it is for GPUs. For combozzle on Lassen I'm seeing sub-millisecond times when running on the GPU,...

> I don't know that I love this, because pretty much by definition, we'll have redundant meshes flying around, which seems avoidable. First of all, could you remind me where...

That was added to provide a uniform way of retrieving the neighbor parts' rank/volume for all of the possible mesh configurations (single/multiple volume, distributed/non-distributed). It was suggested as an alternative...

> I had something in this direction here if it helps: [alexfikl@6b35e06](https://github.com/alexfikl/meshmode/commit/6b35e06c98c32cae056a2648e33755e6540f701b) > > It mostly just checks that the structure of the discretizations is the same, but it doesn't...