pygrib icon indicating copy to clipboard operation
pygrib copied to clipboard

pygrib.fromstring entire grib file?

Open jparal opened this issue 8 years ago • 12 comments

would it be possible/difficult to support fromstring() entire grib file instead of a single message?

jparal avatar Aug 24 '16 15:08 jparal

Not sure what the use case would be - fromstring is usually used to create a grib message object from a binary string. If you have a bunch of binary strings strung together, they could be split up by looking for the grib end section ('7777' when decoded to ascii), and then iterated over.

jswhit avatar Aug 29 '16 15:08 jswhit

I would like to stream data from from internet and split the grib file without storing it on a file system. I guessed that there must be a message separator but I did not know about '7777' string. I tested your suggestion but unsuccessfully. Would you mind to show me an example? I can figure out the streaming from URL but I dont know how to split the messages. I would expect this line should give me number of messages but it does not. What I am doing wrong?

len(open('./nam_218_20160101_0000_001.grb','rb').read().split('7777'))

jparal avatar Aug 30 '16 23:08 jparal

I think you need to add decode('ascii','ignore'), i.e.

(open('./nam_218_20160101_0000_001.grb','rb').read().decode('ascii','ignore')).split('7777')

jswhit avatar Aug 31 '16 00:08 jswhit

If it works, each string should start with 'GRIB' and end with '7777'.

jswhit avatar Aug 31 '16 00:08 jswhit

len(open('./nam_218_20160101_0000_001.grb','rb').read().decode('ascii','ignore').split('7777')) gives length 7972 and only 448 starts with 'GRIB' string.

jparal avatar Aug 31 '16 00:08 jparal

Note that you'll have to add the 7777 back to each string, since the split will remove it.

jswhit avatar Aug 31 '16 02:08 jswhit

I could not get the string split method to work - turns out grib messages sometimes have '7777' in the body of the message and not just at the end. Here's a script that does work for GRIB2 files (for me at least).

import pygrib, sys, struct
filename = sys.argv[1]
f = open(filename,'rb')
msgs = []
while 1:
    # find next occurence of string 'GRIB' (or EOF).
    nbyte = f.tell()
    while 1:
        f.seek(nbyte)
        start = f.read(4).decode('ascii','ignore')
        if start == '' or start == 'GRIB': break
        nbyte = nbyte + 1
    if start == '': break # at EOF
    # otherwise, start (='GRIB') contains indicator message (section 0)
    startpos = f.tell()-4
    f.seek(4,1)  # next four octets are reserved
    # 5th octet is length of grib message
    lengrib = struct.unpack('>q',f.read(8))[0]
    # read in entire grib message, append to list.
    f.seek(startpos)
    gribmsg = f.read(lengrib)
    msgs.append(gribmsg)
# convert grib message string to grib message object
for msg in msgs:
    print pygrib.fromstring(msg)

jswhit avatar Aug 31 '16 02:08 jswhit

Here's a version that works for both GRIB1 and GRIB2:

import pygrib, sys, struct
filename = sys.argv[1]
f = open(filename,'rb')
msgs = []
while 1:
    # find next occurence of string 'GRIB' (or EOF).
    nbyte = f.tell()
    while 1:
        f.seek(nbyte)
        start = f.read(4).decode('ascii','ignore')
        if start == '' or start == 'GRIB': break
        nbyte = nbyte + 1
    if start == '': break # at EOF
    # otherwise, start (='GRIB') contains indicator message (section 0)
    startpos = f.tell()-4
    f.seek(3,1)  # next three octets are reserved
    # grib version number
    vers = struct.unpack('>B',f.read(1))[0]
    # length of grib message
    if vers == 2:
        lengrib = struct.unpack('>q',f.read(8))[0]
    elif vers == 1:
        f.seek(startpos+4)
        lengrib = struct.unpack('>i','\x00'+f.read(3))[0]
    # read in entire grib message, append to list.
    f.seek(startpos)
    gribmsg = f.read(lengrib)
    msgs.append(gribmsg)
# convert grib message string to grib message object
for msg in msgs:
    print pygrib.fromstring(msg)

jswhit avatar Aug 31 '16 03:08 jswhit

I appreciate you taking your time. this answers my question. thx

jparal avatar Aug 31 '16 12:08 jparal

I finally get time to use this more extensively but when I compared the results of using the method you posted above (where i split the grib file in memory) VS pygrib.open() of a physical file, I get a different number of messages. In fact, even the message length differs. Any idea why?

jparal avatar Dec 23 '16 19:12 jparal

nope, no idea

jswhit avatar Dec 24 '16 04:12 jswhit

@jparal I ran into the same error. I solved it like this:

import sys, struct
f = open('grib','rb')
msgs = []
f.seek(0, 2)
size = f.tell()
f.seek(0)
while 1:
    # find next occurence of string 'GRIB' (or EOF).
    nbyte = f.tell()
    while 1:
        f.seek(nbyte)
        start = f.read(4).decode('ascii', 'ignore')
        if start == 'GRIB':
            break
        nbyte = nbyte + 1
        if nbyte >= size:
            break

    if nbyte >= size:
        break
    # otherwise, start (='GRIB') contains indicator message (section 0)
    startpos = f.tell()-4
    f.seek(3,1)  # next three octets are reserved
    # grib version number
    vers = struct.unpack('>B',f.read(1))[0]
    # length of grib message
    if vers == 2:
        lengrib = struct.unpack('>q',f.read(8))[0]
    elif vers == 1:
        f.seek(startpos+4)
        lengrib = struct.unpack('>i', b'\x00'+f.read(3))[0]
    # read in entire grib message, append to list.
    f.seek(startpos)
    gribmsg = f.read(lengrib)
    msgs.append(gribmsg)

The key difference is that I stop when nbyte exceeds the file size.

astanway avatar Jun 22 '18 22:06 astanway