Sevin Fide Varoglu comments

Results 10 comments of


                                            Sevin Fide Varoglu

community: update OctoAIEmbeddings to subclass OpenAIEmbeddings

@baskaryan, @efriis, @eyurtsev, @hwchase17 please review

[XLA:GPU] Add host offloading support to collective pipeliner

> Do you have a hlo for host offload to demonstrate this speed-up? Adding it into benchmark suite could guard host offloading features from regression for future development. Added. I...

[XLA:GPU] Add host offloading support to collective pipeliner

> Could you also add a unit test to demonstrate how "dynamic_variable_tuple_indices" config is used under FusionDynamicMemcpyRewriter? Added to `copy_test` as DynamicMemcpyFusion::GetMemcpyDescriptorForFusion is in copy.cc

[XLA:GPU] Add host offloading support to collective pipeliner

> I have a high level question: from the PR description, the benchmark performance data, it is not alway true that the runtimes are decreased, there are like 5 cases...

[XLA:GPU] Add host offloading support to collective pipeliner

> Thanks for the explanation. In that case, I would think replacing that block of benchmark performance data with somethink similar to what you have just said is better. The...

[XLA:GPU] Add host offloading support to collective pipeliner

> Would you please split host_offload_utils* to a PR, we can move forward with submitting that PR. Smaller PRs are preferred for many reasons, such as easy submission, we can...

[XLA:GPU] Add host offloading support to collective pipeliner

> Did you remove the benchmark you add to this PR? we actually want to have such benchmarks. Merged as a separate PR https://github.com/openxla/xla/pull/34335

Latency Hiding Scheduler leads to x5 memory usage if used without jax.lax.scan

@qGentry Using JAX 0.4.35 `XLA_FLAGS="--xla_gpu_graph_level=0 --xla_gpu_enable_triton_gemm=false --xla_gpu_enable_command_buffer= "` and `SCAN=False`, I'm seeing a failure. ```Out of memory while trying to allocate 35701941112 bytes. *** Check failure stack trace: *** @...

Latency Hiding Scheduler leads to x5 memory usage if used without jax.lax.scan

@qGentry Can you please set `XLA_CLIENT_MEM_FRACTION=0.95` and use `--xla_gpu_copy_insertion_use_region_analysis` in addition to your existing flags and report back if it resolves the issue?

Latency Hiding Scheduler leads to x5 memory usage if used without jax.lax.scan

@qGentry `xla_gpu_memory_limit_slop_factor` flag could also help in this case. The default value is 95, so you can experiment with lower values (90, 80, 70, etc.). You can find more info...