core icon indicating copy to clipboard operation
core copied to clipboard

PBFOsmStreamTarget not writing valid data

Open juliusfriedman opened this issue 4 years ago • 12 comments

Given the following function which operates on any osm.pbf file and writes the output to the destinationFileName

/// <summary>
        /// Method which will read a file and strip out any tags which are not railway=
        /// </summary>
        /// <param name="souceFileName">Where to read</param>
        /// <param name="destinationFileName">Where to write</param>
        internal static void FilterForRailway(string souceFileName, string destinationFileName)
        {
            //Read an inputfile
            using (var sourceFile = File.OpenRead(souceFileName))
            {
                //Use a PBFOsmStreamSource to read the nodes.
                var source = new OsmSharp.Streams.PBFOsmStreamSource(sourceFile);
                
                //Create an output file
                using (var outputFile = File.OpenWrite(destinationFileName))
                {
                    //Create the writer
                    var target = new OsmSharp.Streams.PBFOsmStreamTarget(outputFile);
                    
                    //Initialize it
                    target.Initialize();

                    //Loop all nodes in the source
                    foreach (var node in source)
                    {
                        //If the node doesn't have a railway attribute then continue
                        if(false == node.Tags.ContainsKey("railway"))
                        {
                            continue;
                        }

                        //handle writing of way, relation or node
                        if(node is OsmSharp.Way way)
                        {
                            Console.WriteLine($"Writing Way {way.ToString()}");

                            target.AddWay(way);
                        } 
                        else if (node is OsmSharp.Relation relation)
                        {
                            Console.WriteLine($"Writing Relation {relation.ToString()}");

                            target.AddRelation(relation);
                        }
                        else if (node is OsmSharp.Node osmNode)
                        {
                            Console.WriteLine($"Writing Node {osmNode.ToString()}");

                            target.AddNode(osmNode);
                        }
                        else
                        {
                            Console.WriteLine($"Writing Node from Unknown: {node.ToString()}");

                            target.AddNode(new OsmSharp.Node()
                            {                                
                                Id = node.Id,
                                ChangeSetId = node.ChangeSetId,
                                //Latitude = 0,
                                //Longitude = 0,
                                Tags = node.Tags,
                                TimeStamp = node.TimeStamp,
                                UserId = node.UserId,
                                UserName = node.UserName,
                                Version = node.Version,
                                Visible = node.Visible,
                                //Type = node.Type
                            });
                        }

                        target.Flush();
                    }
                }
            }
        }

I get a resulting osm.pbf file but the file is not able to be parsed back by other readers such as QGIS.

I am wondering if I did something wrong in my function and if so what I did.

Thank you for your time!

juliusfriedman avatar Mar 02 '20 18:03 juliusfriedman

I tried doing something like this but the result seems to be the same, it just takes longer

var filtered = from element in source
                              where element.Type == OsmSharp.OsmGeoType.Node ||
                               (element.Type == OsmSharp.OsmGeoType.Way && (element.Tags.ContainsKey("railway")))
                              select element;

               var complete = filtered.ToComplete();

juliusfriedman avatar Mar 03 '20 01:03 juliusfriedman

I did not know QGIS could open osm pbf files. You are not talking about mvt (mapbox vector tiles) also encoded using protobuf?

xivk avatar Mar 03 '20 10:03 xivk

Yes, QGIS can open osm.opbf files and show a visual representation before importing with ogr2ogr although I am looking at https://github.com/OsmSharp/sqlserver-dataprovider to replace that also.

No I am not talking about mvt, I have a planet.osm.pbf file and I want to get similar results to what:osmium tags-filter -o planet-rail.osm.pbf planet.osm.pbf nw/railway would provide.

I would like to achieve this in code to simplify my processes for creating routerdb so they can be updated when required.

juliusfriedman avatar Mar 03 '20 13:03 juliusfriedman

When I use the var complete = filtered.ToComplete(); from the above example the resulting file is almost 10 times as large as the original.. I don't think I need it as I loop all nodes in the original example anyway...

What I did was I downloaded pa.osm.pbf from http://download.geofabrik.de/north-america/us/pennsylvania-latest.osm.pbf

Then I ran the above function on the file and output what was supposed to be a smaller file with only the railyway tags; What I got what a file that is over 1GB and it's still growing..

Is there a problem in my above logic?

juliusfriedman avatar Mar 03 '20 16:03 juliusfriedman

It is possible QGIS doesn't support the uncompressed variety of the PBF files; try adding compress=true here:

https://github.com/OsmSharp/core/blob/develop/src/OsmSharp/Streams/PBFOsmStreamTarget.cs#L57

This value isn't true by default because it would a breaking change in OsmSharp. When I release v7 it will be true by default.

I think your filter also include all the nodes, not just those part of a railway. To do that you need two passes. FIrst pass you take all railway ways, index all nodes in a hashset. Second pass you return all nodes with ids in the hashset and all railway ways. That should give you all the railway data and only the railway data.

xivk avatar Mar 03 '20 17:03 xivk

I will give it a try, thank you!

juliusfriedman avatar Mar 03 '20 18:03 juliusfriedman

        /// <summary>
        /// Method which will read a file and strip out any tags which are not railway=
        /// </summary>
        /// <param name="souceFileName">Where to read</param>
        /// <param name="destinationFileName">Where to write</param>
        internal static void FilterForRailway(string souceFileName, string destinationFileName)
        {
            //Read an inputfile
            using (var sourceFile = File.OpenRead(souceFileName))
            {
                //Use a PBFOsmStreamSource to read the nodes.
                var source = new OsmSharp.Streams.PBFOsmStreamSource(sourceFile);
                
                //Create an output file
                using (var outputFile = File.OpenWrite(destinationFileName))
                {
                    //Create the writer with compressed data
                    var target = new OsmSharp.Streams.PBFOsmStreamTarget(outputFile, true);
                    
                    //Initialize it
                    target.Initialize();

                    //First pass you take all railway ways [nodes], index all nodes in a hashset. 
                    var filtered = source.Where(element => element.Type == OsmSharp.OsmGeoType.Node && element.Tags.ContainsKey("railway"));

                    //Create the HashSet
                    HashSet<long?> index = new HashSet<long?>(filtered.Select(g => g.Id));

                    //Second pass you return all nodes with ids in the hashset and all railway ways.
                    filtered = source.Where(element => index.Contains(element.Id) || element.Type == OsmSharp.OsmGeoType.Way && element.Tags.ContainsKey("railway"));

                    //Loop all nodes in the source
                    foreach (var node in filtered.ToComplete())
                    {
                        //handle writing of way, relation or node
                        if (node is OsmSharp.Way way)
                        {
                            Console.WriteLine($"Writing Way {way.ToString()}");

                            target.AddWay(way);
                        } 
                        else if (node is OsmSharp.Relation relation)
                        {
                            Console.WriteLine($"Writing Relation {relation.ToString()}");

                            target.AddRelation(relation);
                        }
                        else if (node is OsmSharp.Node osmNode)
                        {
                            Console.WriteLine($"Writing Node {osmNode.ToString()}");

                            target.AddNode(osmNode);
                        }
                        else
                        {
                            Console.WriteLine($"Not Node from Unknown: {node.ToString()}");

                            continue;
                        }

                        target.Flush();
                    }
                }
            }
        }

Still not working but the file is smaller :)

Also why do I have to make 2 passes?

Couldn't I just do:

//Combine first and 2nd pass
                    var filtered = source.Where(element => (element.Type == OsmSharp.OsmGeoType.Node && element.Tags.ContainsKey("railway")) || element.Type == OsmSharp.OsmGeoType.Way && element.Tags.ContainsKey("railway"));

Furthermore, I don't care if QGIS can read the file or not honestly (although it would help) but the big problem is that I can't build a routerDb from the resulting osm.pbf :( it can't find any routes positions.

I am basically just trying to achieve what osmium does with the aforementioned command above: osmium tags-filter -o planet-rail.osm.pbf planet.osm.pbf nw/railway

juliusfriedman avatar Mar 03 '20 19:03 juliusfriedman

var filtered = from element in source
                                   where element.Type == OsmSharp.OsmGeoType.Node ||
                                    (element.Type == OsmSharp.OsmGeoType.Way && (element.Tags.ContainsKey("railway")))
                                   select element;

Still results in a file which is larger than the original even with compress = true. (still almost 4 times as large)

This is strange to me as it should only be writing a subset of the data.

Several other attempts to handle this have resulted in smaller files but none of which seem to be valid:

foreach (var node in filtered.ToComplete())
                    {
                        //handle writing of way, relation or node
                        if (node is OsmSharp.Way way)
                        {
                            Console.WriteLine($"Writing Way {way.ToString()}");

                            target.AddWay(way);
                        }
                        else if (node is OsmSharp.Complete.CompleteWay completeWay)
                        {
                            Console.WriteLine($"Writing Complete Way {completeWay.ToString()}");

                            var simple = completeWay.ToSimple() as OsmSharp.Way;

                            target.AddWay(simple);
                        }
                        else if (node is OsmSharp.Relation relation)
                        {
                            Console.WriteLine($"Writing Relation {relation.ToString()}");

                            target.AddRelation(relation);
                        }
                        else if (node is OsmSharp.Complete.CompleteRelation completeRelation)
                        {
                            Console.WriteLine($"Writing Complete Relation {completeRelation.ToString()}");

                            var simple = completeRelation.ToSimple() as OsmSharp.Relation;

                            target.AddRelation(simple);
                        }
                        else if (node is OsmSharp.Node osmNode)
                        {
                            Console.WriteLine($"Writing Node {osmNode.ToString()}");

                            target.AddNode(osmNode);
                        }
}

Valid meaning as that I cannot get a routerDb generated from them which resolves anything nor can I open with QGIS.

Even tools like osmconvert report:

osmconvert Error: block raw size expected at: 0x1A.

juliusfriedman avatar Mar 11 '20 18:03 juliusfriedman

Just as an FYI, the example given @ Sample.CompleteStream does not result in a file which loads in QGIS either. And tools like osmconvert report the same error block raw size expected at: 0x1A.

juliusfriedman avatar Mar 12 '20 11:03 juliusfriedman

Can you make a simple reproducible test, then I can have a look.

xivk avatar Apr 06 '20 12:04 xivk

You can just use your CompleteStream example or any of the other examples you provide. Ignore my functions for now as needed.

  1. Load the Source PBF with QGIS before and verify it will load
  2. Run the Sample on the Source and write out a new PBF
  3. Attempt load the new PBF in QGIS or use any tools such as osmconvert etc and you will receive the error cited above:block raw size expected at: 0x1A.

My team has tested this on pretty much all the examples provided and they all exhibit the same issue after being written out.

juliusfriedman avatar Apr 06 '20 17:04 juliusfriedman

@xivk @juliusfriedman #132 the same error

IldarKhayrutdinov avatar Aug 26 '21 10:08 IldarKhayrutdinov

Unconfirmed but probably fixed by fixing #132, feel free to reopen if not the case.

xivk avatar Nov 29 '22 08:11 xivk