lasio icon indicating copy to clipboard operation
lasio copied to clipboard

Read files with lots of blank lines in the data section

Open ThomasMGeo opened this issue 2 years ago • 5 comments

Describe the bug Can't read in this .las file. This is a open source .las file from Oklahoma.

To Reproduce Steps to reproduce the behavior: Using latest version of lasio (v 0.29) on windows

.las file is zipped 1051985022.zip

ThomasMGeo avatar Apr 28 '22 15:04 ThomasMGeo

It looks like the lines are double spaced, which seems to be causing a problem when it gets down to the ~A (data) section. You might "just" need to strip those out first, write out a new file, and then it seems to work.

forig = '1051985022.las'
with open(forig) as f:
    lines = f.readlines()
lines = lines[::2]

with open('1051985022_cleaned.las', 'w') as f:
    for line in lines:
        f.write(f'{line}\n')

las = lasio.LASFile('1051985022_cleaned.las')

EvanBianco avatar Apr 28 '22 16:04 EvanBianco

Would it make sense to strip out every blank line, or those are needed for other formatting needs?

ThomasMGeo avatar Apr 28 '22 16:04 ThomasMGeo

Check that. What I've done above will run, but the curves aren't being parsed in correctly. I'll poke around some more.

EvanBianco avatar Apr 28 '22 16:04 EvanBianco

Just reposting here the comments I made in the welly-and-lasio channel on Software Underground slack.

" ... this particular file is ugly. There are blank lines between the lines with text, but you only need to remove the blank lines in the data section. I've removed them in the header section as well, and unfortunately it's not every other line throughout the file. It shifts by one at the start of the ~A data so striding over the whole file won't work. Beware of obi-wan!

So this little code snippet will do what you need for this one well. Finding the position where the ~A is in the file. But it gets you what you need.

I wonder if lasio could detect this and potentially handle it gracefully?

import lasio

def find_data_start(lines):
    for i, line in enumerate(lines):
        if line.startswith('~A'):
            return i + 1

fname = '1051985022.las'

with open(fname) as f:
    lines = f.readlines()
    
stack1 = lines[:find_data_start(lines):2]  # cleaned header
stack2 = lines[find_data_start(lines)::2]  # cleaned data
stack = stack1 + stack2

with open('1051985022_cleaned.las', 'w') as f:
    for line in stack:
        f.write(f'{line}')

las = lasio.LASFile('1051985022_cleaned.las')

EvanBianco avatar Apr 28 '22 17:04 EvanBianco

Thanks for raising this issue and the example code. I'll take a look, it definitely should be readable.

kinverarity1 avatar May 11 '22 14:05 kinverarity1

I looked into this file today. Here are my finding working with the file on a mac:

  • Opening the file with a textedit, there are the extra blank lines.
  • Opening the file in Vim, there aren't any blank lines, but there is a windows carriage-return ^M character at the end of each line.
  • With the following script, the file opens okay and I think handles the headers and data sections properly:
import lasio
las = lasio.LASFile("tmp/1051985022.las", index_unit='ft')

# Print the number of columns in a data row
# Output: 33
print(len(las.data[1]))

# Print number of data rows
# Output: 6657
print(len(las.data))


# Print the first two values of the last data row
# Output: array([3577.    , 2004.0742])
las.data[-1][0:2]                                                                                                                                                                                                                                                                                                                                                                                                            

--

  • If someone could check that the script has the same (or different) behavior on Windows, that will clarify if Lasio needs changes to handle the extra ^M characters when running on Windows.

  • Also, if it succeeds in Lasio, it seems like the LAS object in Welly should work, if Welly is working with the las.LASFile() object, since the ^M characters should have been filtered out by Lasio's parsing of the file.

  • Was there an error message when attempting to read the file?

dcslagel avatar Jan 26 '23 23:01 dcslagel

@ThomasMGeo, @EvanBianco ,

I continued to look into this today and have a couple of additional findings:

  1. There is a Lasio Test case for a file with extra ^M in it:

pytest tests/test_open_file.py::test_open_url_different_newlines

This test passes on both Ubuntu and Windows in the GitHub Action Tests

  1. I tried to duplicate the related Welly issue: https://github.com/agilescientific/welly/issues/220
    by running this Welly script with Welly 0.5.2 and Lasio 0.30, on a macos machine.
import lasio
from welly import Well, Project
#--------------------------------------------------------------------------------------
mywell = Well.from_las("tmp/1051985022.las", index_unit='ft') 
gr = mywell.data['GR']
print(mywell.df())

Below is the resulting output. It indicates there are 6657 lines of data and that is the same number of data lines in the file.

             TENS    SPHI        SP  RXRT        RXO       RT90       RT60       RT30       RT20  ...       GR       DT  DRHO  DPHS  DPHI  DPHD  DLIM  CT90    CALI
DEPT                                                                                              ...                                                              
249.0         NaN  0.0574  125.5235   NaN        NaN        NaN        NaN        NaN        NaN  ...      NaN  55.7208   NaN   NaN   NaN   NaN   NaN   NaN     NaN
249.5         NaN  0.0630  126.1887   NaN        NaN        NaN        NaN        NaN        NaN  ...      NaN  56.5073   NaN   NaN   NaN   NaN   NaN   NaN     NaN
250.0   1365.0989  0.0638  127.0521   0.0  1999.9999  1999.9999  1999.9999  1999.9999  1999.9999  ...  37.7590  56.6280   NaN   NaN   NaN   NaN   NaN   0.5  8.1087
250.5   1363.3073  0.0610  127.5533   0.0  1999.9999  1999.9999  1999.9999  1999.9999  1999.9999  ...  41.9990  56.2275   NaN   NaN   NaN   NaN   NaN   0.5  8.1017
251.0   1362.9364  0.0471  127.7967   0.0  1999.9999  1999.9999  1999.9999  1999.9999  1999.9999  ...  46.8416  54.2585   NaN   NaN   NaN   NaN   NaN   0.5  8.1003
...           ...     ...       ...   ...        ...        ...        ...        ...        ...  ...      ...      ...   ...   ...   ...   ...   ...   ...     ...
3575.0  2610.8960     NaN       NaN   NaN        NaN        NaN        NaN        NaN        NaN  ...      NaN      NaN   NaN   NaN   NaN   NaN   NaN   NaN     NaN
3575.5  2517.2781     NaN       NaN   NaN        NaN        NaN        NaN        NaN        NaN  ...      NaN      NaN   NaN   NaN   NaN   NaN   NaN   NaN     NaN
3576.0  2314.6697     NaN       NaN   NaN        NaN        NaN        NaN        NaN        NaN  ...      NaN      NaN   NaN   NaN   NaN   NaN   NaN   NaN     NaN
3576.5  2151.5366     NaN       NaN   NaN        NaN        NaN        NaN        NaN        NaN  ...      NaN      NaN   NaN   NaN   NaN   NaN   NaN   NaN     NaN
3577.0  2004.0742     NaN       NaN   NaN        NaN        NaN        NaN        NaN        NaN  ...      NaN      NaN   NaN   NaN   NaN   NaN   NaN   NaN     NaN

[6657 rows x 32 columns]

Given these findings, it seems possible the extra blank lines caused by the extra ^M characters, isn't an issue for parsing this file with either Welly or Lasio. Maybe there is a different issue with the file. @ThomasMGeo, what was there failure error message when trying to bring the file into Welly/Laisio?

Thanks! DC

dcslagel avatar Jan 31 '23 23:01 dcslagel

Hi DC, it was on a windows machine. If it works for you now, feel free to close. I will look into it next week.

Best, Thomas

ThomasMGeo avatar Feb 09 '23 17:02 ThomasMGeo

@ThomasMGeo , Thanks for the update. If you will look into it next week, lets leave it open for your updates. If there is an issue it will be good to resolve it.

dcslagel avatar Feb 09 '23 18:02 dcslagel

This seems fixed after testing today! Thanks again. I am going to close it now, and will make a new issue if it comes up.

Thanks, -TM

ThomasMGeo avatar Feb 17 '23 16:02 ThomasMGeo

@ThomasMGeo, Good. Thank you retesting and the update! :-)

dcslagel avatar Feb 17 '23 20:02 dcslagel