smart_open icon indicating copy to clipboard operation
smart_open copied to clipboard

improve timeout handling

Open mpenkov opened this issue 6 years ago • 2 comments

Problem description

Be sure your description clearly answers the following questions:

  • What are you trying to achieve?
  • What is the expected result?
  • What are you seeing instead?

When iterating through a large file, it's possible that botocore will raise a timeout error. Currently, this terminates the read entirely, and you have to start again from the beginning of the file. The trace looks like this:

/usr/lib/python3.6/gzip.py in read(self, size)                                                                                                                                                                     
     89             self._read = None                                                                                                                                                                              
     90             return self._buffer[read:] + \                                                                                                                                                                 
---> 91                    self.file.read(size-self._length+read)                                                                                                                                                  
     92                                                                                                                                                                                                            
     93     def prepend(self, prepend=b''):                                                                                                                                                                        
                                                                                                                                                                                                                   
~/git/smart_open/smart_open/s3.py in read(self, size)                                                                                                                                                              
    282         #                                                                                                                                                                                                  
    283         # logger.debug('filling %r byte-long buffer up to %r bytes', len(self._buffer), size)                                                                                                              
--> 284         self._fill_buffer(size)                                                                                                                                                                            
    285         return self._read_from_buffer(size)                                                                                                                                                                
    286                                                                                                                                                                                                            
                                                                                                                                                                                                                   
~/git/smart_open/smart_open/s3.py in _fill_buffer(self, size)                                                                                                                                                      
    340         size = size if size >= 0 else self._buffer._chunk_size                                                                                                                                             
    341         while len(self._buffer) < size and not self._eof:                                                                                                                                                  
--> 342             bytes_read = self._buffer.fill(self._raw_reader)                                                                                                                                               
    343             if bytes_read == 0:                                                                                                                                                                            
    344                 logger.debug('reached EOF while filling buffer')                                                                                                                                           
                                                                                                                                                                                                                   
~/git/smart_open/smart_open/bytebuffer.py in fill(self, source, size)                                                                                                                                              
    150                                                                                                                                                                                                            
    151         if hasattr(source, 'read'):                                                                                                                                                                        
--> 152             new_bytes = source.read(size)                                                                                                                                                                  
    153         else:                                                                                                                                                                                              
    154             new_bytes = b''                                                                                                                                                                                
                                                                                                                                                                                                                   
~/git/smart_open/smart_open/s3.py in read(self, size)                                                                                                                                                              
    206             binary = self._body.read()                                                                                                                                                                     
    207         else:                                                                                                                                                                                              
--> 208             binary = self._body.read(size)                                                                                                                                                                 
    209         self._position += len(binary)                                                                                                                                                                      
    210         return binary                                                                                                                                                                                      
                                                                                                                                                                                                                   
~/envs/dbi2/lib/python3.6/site-packages/botocore/response.py in read(self, amt)                                                                                                                                    
     79         except URLLib3ReadTimeoutError as e:                                                                                                                                                               
     80             # TODO: the url will be None as urllib3 isn't setting it yet                                                                                                                                   
---> 81             raise ReadTimeoutError(endpoint_url=e.url, error=e)                                                                                                                                            
     82         self._amount_read += len(chunk)                                                                                                                                                                    
     83         if amt is None or (not chunk and amt > 0):

ReadTimeoutError: Read timeout on endpoint URL: "None"

Ideally, there should be some sort of recovery mechanism:

  • Try to reconnect again, up to a certain number of times. This should happen transparently, without the user noticing any problems.
  • If all else fails, the timeout error should contain the current stream position (via .tell()) such that the user can recover the read themselves

Steps/code to reproduce the problem

In order for us to be able to solve your problem, we have to be able to reproduce it on our end. Without reproducing the problem, it is unlikely that we'll be able to help you.

Include full tracebacks, logs and datasets if necessary. Please keep the examples minimal (minimal reproducible example).

Versions

Please provide the output of:

import platform, sys, smart_open
print(platform.platform())
print("Python", sys.version)
print("smart_open", smart_open.__version__)
Linux-4.15.0-70-generic-x86_64-with-Ubuntu-18.04-bionic
Python 3.6.5 (default, Apr  1 2018, 05:46:30) 
[GCC 7.3.0]
smart_open 1.9.0

Checklist

Before you create the issue, please make sure you have:

  • [x] Described the problem clearly
  • ~[ ] Provided a minimal reproducible example, including any required data~ (too hard to reliably reproduce this)
  • [x] Provided the version numbers of the relevant software

mpenkov avatar Dec 08 '19 06:12 mpenkov

Any update on this issue ?

vinchauhan avatar Jun 23 '21 06:06 vinchauhan

No.

mpenkov avatar Jun 23 '21 09:06 mpenkov