Alfred Xu issues

Results 7 issues of


                                            Alfred Xu

Add max running time as an optional stop criteria

/kind feature **Describe the solution you'd like** Currently, katib has three kinds of succeed reason: 1. ExperimentGoalReached 2. ExperimentMaxTrialsReached 3. ExperimentSuggestionEndReached In my opinion, it will bring a lot convenient...

priority/p2

kind/feature

lifecycle/frozen

Refine `CudaTest.testCudaException` in case throwing wrong type of CudaError under aarch64

Fix #15705 1. Replacing `Cuda.memset(Long.MAX_VALUE, (byte) 0, 1024)` with `Cuda.freePinned(-1L)`, the previous one throws fatal CUDAError `cudaErrorIllegalAddress` instead of nonFatal CUDAError `cudaErrorInvalidValue` under aarch64, while the later one throwing the...

bug

cuDF (Java)

non-breaking

[BUG] [JNI] `CudaTest.testCudaException` will not throw `cudaErrorInvalidValue` expectedly under certain environment

**Describe the bug** For the test case `CudaTest.testCudaException`: ```java assertThrows(CudaException.class, () -> { try { Cuda.memset(Long.MAX_VALUE, (byte) 0, 1024); } catch (CudaFatalException ignored) { } catch (CudaException ex) { assertEquals(CudaException.CudaError.cudaErrorInvalidValue,...

bug

Triple buffering: Bind Virtual Resource Budget to Physical Memory Allocation [databricks]

Closes #13969 ### Overview This PR tightly couples the virtual memory budget with the lifecycle of the actual memory buffer `HostMemoryBuffer` used in the runner, by making `MemoryBoundedAsyncRunner` serve as...

[FEA] Triple Buffering: Bind Async Resource Budget to Physical Memory Allocation

**Is your feature request related to a problem? Please describe.** The current `ResourceBoundedExecutor` manages asynchronous scanning using a "virtual" budget (Triple Buffer MemManagement) that is loosely coupled with the actual...

feature request

performance

improve

[FEA] Column-wise columnar batch concatenation

**Is your feature request related to a problem? Please describe.** Based on insights from #13884, we observed severe OOM retries and semaphore waits during batch concatenation. The current implementation of...

feature request

? - Needs Triage

performance

improve

[BUG] Spill occurs in GpuAggregate when GPU batch size reduces

**Describe the bug** When running a heavy GpuAggregate consisting of over 400 aggregate functions (including hundreds of comprehensive function `stddev_pop` ), significant amount of spill is observed in the map...

bug

? - Needs Triage

performance