beam icon indicating copy to clipboard operation
beam copied to clipboard

[Bug]: Potential issue in python SDF example

Open apichick opened this issue 3 years ago • 2 comments

What happened?

I was checking this example

sdks/python/apache_beam/examples/snippets/snippets.py

And I think there might be an issue with this line of code

file_handle.seek(tracker.current_restriction.start())

It should be replaced with

file_handle.seek(tracker.current_restriction.start)

As the OffsetRange class does not have a method called start(), but just a member variable called start.

In addition to this I would also like to make a question about this excert of code:

  class FileToWordsFn(beam.DoFn):
    def process(
        self,
        file_name,
        # Alternatively, we can let FileToWordsFn itself inherit from
        # RestrictionProvider, implement the required methods and let
        # tracker=beam.DoFn.RestrictionParam() which will use self as
        # the provider.
        tracker=beam.DoFn.RestrictionParam(FileToWordsRestrictionProvider())):
      with open(file_name) as file_handle:
        file_handle.seek(tracker.current_restriction.start())
        while tracker.try_claim(file_handle.tell()):
          yield read_next_record(file_handle)

    # Providing the coder is only necessary if it can not be inferred at
    # runtime.
    def restriction_coder(self):
      return ...

Why is is that the seek and tell are done before and it does not do the tryClaim directly with the position returned by the current tracker and leave the seek to when the claim has already been made?

Issue Priority

Priority: 2

Issue Component

Component: examples-python

apichick avatar Jun 24 '22 11:06 apichick

.take-issue

ryanthompson591 avatar Aug 09 '22 18:08 ryanthompson591

@lukecwik may be able to advise here

Also, I was under the impression that these snippets were getting tested continuously. If this one is broken, we should look into why it's not being tested

TheNeuralBit avatar Aug 09 '22 19:08 TheNeuralBit

I don't think any snippet in the docs is being tested, I was not able to find any test for those in the code base.

On Tue, Aug 9, 2022 at 9:04 PM Brian Hulette @.***> wrote:

@lukecwik https://github.com/lukecwik may be able to advise here

Also, I was under the impression that these snippets were getting tested continuously. If this one is broken, we should look into why it's not being tested

— Reply to this email directly, view it on GitHub https://github.com/apache/beam/issues/22042#issuecomment-1209767685, or unsubscribe https://github.com/notifications/unsubscribe-auth/AESWBEP74CSOV6E47Y3PP6DVYKTTBANCNFSM5ZXPP2ZA . You are receiving this because you authored the thread.Message ID: @.***>

-- Miren Esnaola | Google Cloud Professional Services | @.*** | office: +34 917 48 66 47

apichick avatar Aug 10 '22 10:08 apichick

It looks like we do have https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py for this, e.g. it tests construct_pipeline: https://github.com/apache/beam/blob/e86456ea71b1203b7158496e4653541809415225/sdks/python/apache_beam/examples/snippets/snippets_test.py#L627-L633

But it looks like there are also a lot of snippets that are untested, including sdf_basic_example. I filed https://github.com/apache/beam/issues/22658 to track addressing this.

TheNeuralBit avatar Aug 10 '22 16:08 TheNeuralBit

The issue with the snippet is correct that it should be .start and that it is untested via snippets_test.py.

lukecwik avatar Aug 11 '22 17:08 lukecwik