Doubts about the sampling rules and buffer distance Settings in the construction of geographical context
Hello! I am studying your paper "MGeo:" published in 2023. When conducting "Multi-Modal Geographic Language Model Pre-Training", there are some questions regarding the specific technical details of the geographic context (GC) construction phase. I hope to receive your answers. The specific questions are as follows:
What are the specific sampling rules used in the paper to generate the query/POI geographical context? When I was designing the sampling rules on my own, I found that the number of polygons and lines screened out near the same POI was relatively small. I'm not sure if this is consistent with the sampling logic in the paper? For the sampling of roads around POI, were only the main road types selected in the paper? Are small roads such as service roads, living_street, footway and path included in the sampling scope? What is the buffer distance (i.e., the radius range for screening the geographical entities around the POI) adopted when constructing the geographical context in the paper? At present, when processing the data of Wuhan City, the buffer distance I set is 300 meters. I would like to adjust it by referring to the parameters in the paper. What exactly is the OSM (OpenStreetMap) data source used in the paper to extract POLYGON data? Was only the osm_landuse_a_hb type data in OSM used? When dealing with polygonal geographical entities, does the paper introduce the concept of "hierarchy" (such as the administrative/functional hierarchy of community → street/town → district/county)? If so, how is this hierarchical relationship reflected in geographic context encodings (such as gis_encoder input)?