gpxpy
gpxpy copied to clipboard
malformed xml
When importing some gpx-files with usage of extensions will broke the file.
I use a downloaded gpx file, which worked with all tools. When importing and exporting the file it is broken. This mainly is caused by a Line-Style-Extension.
import gpxpy
import gpxpy.gpx
fn = "/Users/oe/Desktop/testfile.gpx"
with open(fn) as f:
gf = gpxpy.parse(f)
with open(fn+".new.gpx", 'w+') as f:
f.write(gf.to_xml())
with the following example file.
Definitions of nsmap and schema location might help. Try using the following code:
import gpxpy
import gpxpy.gpx
fn = "testfile.gpx"
with open(fn) as f:
gf = gpxpy.parse(f)
gf.nsmap['gpxtpx'] = 'http://www.garmin.com/xmlschemas/TrackPointExtension/v1'
gf.nsmap['gpxx'] = 'http://www.garmin.com/xmlschemas/GpxExtensions/v3'
gf.nsmap['gpx_style'] = 'http://www.topografix.com/GPX/gpx_style/0/2'
gf.schema_locations = [
'http://www.topografix.com/GPX/1/1',
'http://www.topografix.com/GPX/1/1/gpx.xsd',
'http://www.garmin.com/xmlschemas/GpxExtensions/v3',
'http://www.garmin.com/xmlschemas/GpxExtensionsv3.xsd',
'http://www.garmin.com/xmlschemas/TrackPointExtension/v1',
'http://www.garmin.com/xmlschemas/TrackPointExtensionv1.xsd',
'http://www.topografix.com/GPX/gpx_style/0/2',
'http://www.topografix.com/GPX/gpx_style/0/2/gpx_style.xsd']
with open(fn+".new.gpx", 'w+') as f:
f.write(gf.to_xml('1.1'))
Thanks a lot, @ekspla!
I just started using this library this week and I noticed that my GPX files went from:
<extensions>
<line xmlns="http://www.topografix.com/GPX/gpx_style/0/2">
(omitted)
</line>
</extensions>
to
<extensions>
<http://www.topografix.com/GPX/gpx_style/0/2:line>
(omitted)
</http://www.topografix.com/GPX/gpx_style/0/2:line>
</extensions>
after I loaded it and saved it.
Thanks to your comment, I just had to add
self._data.nsmap['gpx_style'] = 'http://www.topografix.com/GPX/gpx_style/0/2'
before saving and the problem was fixed 🙌 .
Thanks @fernandobrito . I don't have the time to fix this immediately, but at least I wrote a test. If somebody wants to try, it's here https://github.com/tkrajina/gpxpy/commit/0f6c473569c451cce75fcabf92f0897d5925c4db
To run that test:
python3 -m unittest test.GPXTests.test_extension_with_xmlns
I am writing this comment in the hope of someone's PR fixing most of the namespace issues very soon.
I think there are more than one issue related to the namespace (such as #242), and the issues should be solved together by a single PR. The patch I will show below is of adhoc/preliminary version that may introduce another issues/errors, and that is not suitable as PR for review.
There are at least two problems in the code related to these namespace issues.
- If more than one xmlns without prefix exist when parsing gpx, the former is overwritten by the last one.
- Default namespace is fixed/defined to be 'http://www.topografix.com/GPX/1/1' in gpx.to_xml().
When parsing the given xml (here after, 'test_linestyle.gpx') with gpx_style as shown in #254:
def parse() in class GPXParser of parser.py https://github.com/tkrajina/gpxpy/blob/dev/gpxpy/parser.py#L105-#L109
after line 105-107, prefixes and URIs are as followings.
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
prefix = 'xsi', URI = "http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.topografix.com/GPX/1/1"
prefix ='', URI = "http://www.topografix.com/GPX/1/1"
xmlns:topografix="http://www.topografix.com/GPX/Private/TopoGrafix/0/1"
prefix = 'topografix', URI = "http://www.topografix.com/GPX/Private/TopoGrafix/0/1"
xmlns="http://www.topografix.com/GPX/gpx_style/0/2"
prefix = '', URI = "http://www.topografix.com/GPX/gpx_style/0/2"
Note that the second one (GPX/1/1) is overwritten by the forth one because of the same prefix ('').
at line 108-109
if prefix == '':
prefix = 'defaultns' # alias default for easier handling
Now, the namespace without prefix ('') is renamed as 'defaultns' for handling.
Finally when the gpx file is parsed, nsmap is as follows:
>>> gpx.nsmap
{'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
'defaultns': 'http://www.topografix.com/GPX/gpx_style/0/2',
'topografix': 'http://www.topografix.com/GPX/Private/TopoGrafix/0/1'}
>>>
This is because, as described before, the second xmlns (GPX/1/1) was overwritten by the fourth one (gpx_style).
When serializing to xml,
def gpx_fields_to_xml() of gpxfield.py at line 501https://github.com/tkrajina/gpxpy/blob/dev/gpxpy/gpxfield.py#L501
namespaces.remove('defaultns')
'defaultns' is removed. So it's now renamed as '' once again.
def to_xml() in class GPX of gpx.py at line 2694 https://github.com/tkrajina/gpxpy/blob/dev/gpxpy/gpx.py#L2694
self.nsmap['defaultns'] = 'http://www.topografix.com/GPX/{}'.format(version_path)
'defaultns' is appended. Note that namespace other than GPX/1/1 is not allowed to be the default.
In these combination, 'defaultns' is replaced in nsmap from
'http://www.topografix.com/GPX/gpx_style/0/2'
to
'http://www.topografix.com/GPX/1/1'
An adhoc/preliminary version of patch is as followings. This patch is not tested and may cause another issues/errors.
In parser.py https://github.com/tkrajina/gpxpy/blob/dev/gpxpy/parser.py#L105-L114
def parse(self, version: Optional[str]=None) -> mod_gpx.GPX:
for namespace in mod_re.findall(r'\sxmlns:?[^=]*="[^"]+"', self.xml):
prefix, _, URI = namespace[6:].partition('=')
prefix = prefix.lstrip(':')
if prefix == '':
#prefix = 'defaultns' # alias default for easier handling
if 'defaultns' not in self.gpx.nsmap:
prefix = 'defaultns'
else:
prefix = 'ns'
else:
if prefix.startswith("ns"):
Using this patch,
>>> with open('test_linestyle.gpx', 'rb') as f:
... test_linestyle = gpxpy.parse(f.read().decode(), '1.1')
...
>>> test_linestyle.nsmap
{'xsi': 'http://www.w3.org/2001/XMLSchema-instance',
'defaultns': 'http://www.topografix.com/GPX/1/1',
'topografix': 'http://www.topografix.com/GPX/Private/TopoGrafix/0/1',
'ns': 'http://www.topografix.com/GPX/gpx_style/0/2'}
>>>
>>> with open('test_linestyle_1.gpx', 'wb') as f:
... f.write(test_linestyle.to_xml().encode('utf-8'))
...
912
>>>
>>> with open('test_linestyle_1.gpx', 'rb') as f:
... test_linestyle_1 = gpxpy.parse(f.read().decode(), '1.1')
...
>>>
>>> test_linestyle_1.nsmap
{'defaultns': 'http://www.topografix.com/GPX/1/1',
'ns': 'http://www.topografix.com/GPX/gpx_style/0/2',
'topografix': 'http://www.topografix.com/GPX/Private/TopoGrafix/0/1',
'xsi': 'http://www.w3.org/2001/XMLSchema-instance'}
>>>
>>> test_linestyle_1.to_xml()
'<?xml version="1.0" encoding="UTF-8"?>\n
<gpx xmlns="http://www.topografix.com/GPX/1/1"
xmlns:ns="http://www.topografix.com/GPX/gpx_style/0/2"
xmlns:topografix="http://www.topografix.com/GPX/Private/TopoGrafix/0/1"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.topografix.com/GPX/1/1
http://www.topografix.com/GPX/1/1/gpx.xsd
http://www.topografix.com/GPX/Private/TopoGrafix/0/1
http://www.topografix.com/GPX/Private/TopoGrafix/0/1/topografix.xsd"
version="1.1" creator="GaiaGPS for Android">\n
<trk>\n
<extensions>\n
<ns:line>\n
<ns:color>5E7A8C</ns:color>\n
</ns:line>\n
</extensions>\n
<trkseg>\n
<trkpt lat="0.0" lon="0.0">\n
<ele>0.0</ele>\n
<time>2023-02-02T01:00:00Z</time>\n
</trkpt>\n
<trkpt lat="1.0" lon="0.0">\n
<ele>0.0</ele>\n
<time>2023-02-02T01:50:00Z</time>\n
</trkpt>\n
</trkseg>\n
</trk>\n
</gpx>'
>>>
>>> for track in test_linestyle_1.tracks:
... for segment in track.segments:
... for point in segment.points:
... print(point.time, point.latitude, point.longitude)
...
2023-02-02 01:00:00+00:00 0.0 0.0
2023-02-02 01:50:00+00:00 1.0 0.0
>>>