beam
beam copied to clipboard
[Bug]: Potential issue in python SDF example
What happened?
I was checking this example
sdks/python/apache_beam/examples/snippets/snippets.py
And I think there might be an issue with this line of code
file_handle.seek(tracker.current_restriction.start())
It should be replaced with
file_handle.seek(tracker.current_restriction.start)
As the OffsetRange class does not have a method called start(), but just a member variable called start.
In addition to this I would also like to make a question about this excert of code:
class FileToWordsFn(beam.DoFn):
def process(
self,
file_name,
# Alternatively, we can let FileToWordsFn itself inherit from
# RestrictionProvider, implement the required methods and let
# tracker=beam.DoFn.RestrictionParam() which will use self as
# the provider.
tracker=beam.DoFn.RestrictionParam(FileToWordsRestrictionProvider())):
with open(file_name) as file_handle:
file_handle.seek(tracker.current_restriction.start())
while tracker.try_claim(file_handle.tell()):
yield read_next_record(file_handle)
# Providing the coder is only necessary if it can not be inferred at
# runtime.
def restriction_coder(self):
return ...
Why is is that the seek and tell are done before and it does not do the tryClaim directly with the position returned by the current tracker and leave the seek to when the claim has already been made?
Issue Priority
Priority: 2
Issue Component
Component: examples-python
.take-issue
@lukecwik may be able to advise here
Also, I was under the impression that these snippets were getting tested continuously. If this one is broken, we should look into why it's not being tested
I don't think any snippet in the docs is being tested, I was not able to find any test for those in the code base.
On Tue, Aug 9, 2022 at 9:04 PM Brian Hulette @.***> wrote:
@lukecwik https://github.com/lukecwik may be able to advise here
Also, I was under the impression that these snippets were getting tested continuously. If this one is broken, we should look into why it's not being tested
— Reply to this email directly, view it on GitHub https://github.com/apache/beam/issues/22042#issuecomment-1209767685, or unsubscribe https://github.com/notifications/unsubscribe-auth/AESWBEP74CSOV6E47Y3PP6DVYKTTBANCNFSM5ZXPP2ZA . You are receiving this because you authored the thread.Message ID: @.***>
-- Miren Esnaola | Google Cloud Professional Services | @.*** | office: +34 917 48 66 47
It looks like we do have https://github.com/apache/beam/blob/master/sdks/python/apache_beam/examples/snippets/snippets_test.py for this, e.g. it tests construct_pipeline: https://github.com/apache/beam/blob/e86456ea71b1203b7158496e4653541809415225/sdks/python/apache_beam/examples/snippets/snippets_test.py#L627-L633
But it looks like there are also a lot of snippets that are untested, including sdf_basic_example. I filed https://github.com/apache/beam/issues/22658 to track addressing this.
The issue with the snippet is correct that it should be .start and that it is untested via snippets_test.py.