feature-store-api icon indicating copy to clipboard operation
feature-store-api copied to clipboard

[FSTORE-862][APPEND] Don't provide the current offsets by default when starting job

Open bubriks opened this issue 2 years ago • 3 comments

This PR adds/fixes/changes...

  • please summarize your changes to the code
  • and make sure to include all changes to user-facing APIs

JIRA Issue: -

Priority for Review: -

Related PRs: -

How Has This Been Tested?

  • [ ] Unit Tests
  • [ ] Integration Tests
  • [ ] Manual Tests on VM

Checklist For The Assigned Reviewer:

- [ ] Checked if merge conflicts with master exist
- [ ] Checked if stylechecks for Java and Python pass
- [ ] Checked if all docstrings were added and/or updated appropriately
- [ ] Ran spellcheck on docstring
- [ ] Checked if guides & concepts need to be updated
- [ ] Checked if naming conventions for parameters and variables were followed
- [ ] Checked if private methods are properly declared and used
- [ ] Checked if hard-to-understand areas of code are commented
- [ ] Checked if tests are effective
- [ ] Built and deployed changes on dev VM and tested manually
- [x] (Checked if all type annotations were added and/or updated appropriately)

bubriks avatar Sep 13 '23 11:09 bubriks

Are you sure this PR is correct? Looks to me that we always end up in the first branch of the if statement as the initial_check_point is always not empty string as we set it here: https://github.com/logicalclocks/feature-store-api/blame/25cfcd57ad792a3b6a732570943692c49b406fbc/python/hsfs/engine/python.py#L956

It doens't seem there is a way of controlling the skip_offset parameter in the first branch and the offsets are always skipped.

Additionally the skip_offset parameter is not documented anywhere in the APIs. Please add the proper documentation in the insert method.

SirOibaf avatar Sep 17 '23 21:09 SirOibaf

I think everything should be correct.

initial_check_point will be empty if topic doesn't exists (for example after upgrade).

In the first if statement (here: https://github.com/logicalclocks/feature-store-api/blame/25cfcd57ad792a3b6a732570943692c49b406fbc/python/hsfs/engine/python.py#L1016) we always run materialization job setting the initial offset to 0 for all partitions of topic since the topic didn't exist and job should start from the beginning (done here: https://github.com/logicalclocks/feature-store-api/blame/25cfcd57ad792a3b6a732570943692c49b406fbc/python/hsfs/engine/python.py#L1028)

bubriks avatar Sep 18 '23 11:09 bubriks

@SirOibaf I also change the skip_offset parameter name to use_current_offsets as i think its more descriptive.

bubriks avatar Sep 18 '23 11:09 bubriks