Asklv comments

Results 63 comments of


                                            Asklv

Implement UTs for Plugin CustomValidations

> Do I need to increase the test coverage for `JobSet` validation? I think you are right, we can refer to @tenzen-y PR , https://github.com/kubeflow/trainer/pull/2555.

CONTRIBUTING.md should be updated

Hi Contributor @Okabe-Rintarou-0 , I also find this problem and would like to confirm if the PR/issue is still active? cc @andreyvelich

feat: add meta commit task to defer commit meta data.

Check tests: ```rs test storage::storage_manager::test::concurrency::test_concurrency_aligned ... ok test storage::storage_manager::test::concurrency::test_concurrency_unaligned ... ok test new_storage::storage_manager::tests::storage_real_workload_test ... ok test result: ok. 189 passed; 0 failed; 3 ignored; 0 measured; 0 filtered out; finished...

feat: add meta commit task to defer commit meta data.

Version changed: 20M ```bash daten-server:~/data$ dd if=/dev/zero bs=1M count=20 | tr '\0' '1' > datenlord_cache/new_storage_20m_test_version10.txt 20+0 records in 20+0 records out 20971520 bytes (21 MB, 20 MiB) copied, 10.5831 s,...

feat: add meta commit task to defer commit meta data.

> FileHandle.access() can be remove OK, file level version PR will solve the block mismatch issue, if remove the access function now, the fastest will failed. > Current fstest is...

feat: add meta commit task to defer commit meta data.

truncate 20M: version change: 10 write: version change: 556

Bug: `disconnect` between the submission and execution side of the block flush task in TaskManager spawn method.

Because we want to use singleton task manager, this manager will create a task in current runtime context, it is isolated in tokio::test. Reference issues: https://github.com/tokio-rs/tokio/issues/2374

Bug: `disconnect` between the submission and execution side of the block flush task in TaskManager spawn method.

The current debugging result should be pointing to the runtime problem, that Lazy task manager should use the default context runtime in the new tokio::spawn, and then in the test...

docs(trainer): introduce trainer pipeline framework for new users in kubeflow trainer v2

The Issue already mentions the need to introduce documentation similar to the `TrainerPipelineFramework` in [issue](https://github.com/kubeflow/trainer/issues/2458#issuecomment-2698386151), which I've done a simple fine-tuning of, but I'm not sure if there are any...

docs(trainer): introduce trainer pipeline framework for new users in kubeflow trainer v2

> Hi @IRONICBo, did you get a chance to address the remaining comments ? Having this documentation would be super helpful to understand the KF Trainer architecture. @andreyvelich Sorry for...