php-gedcom
php-gedcom copied to clipboard
Problem with CONC lines in SOUR PAGE for a RESI ATTR
The source reference ID for a RESI ATTR is coming out as
@S19@World War II Draft Cards (Fourth Registration) for the State ofWisconsin; State Headquarters: Wisconsin; Record Group Name: Recordsof the Selective Service System, 1940-; Record Group
I have been able to simplify the GEDCOM file and still reproduce it:
0 HEAD
0 @I212@ INDI
1 NAME Bob The /Frog/
1 RESI Age: 55
2 SOUR @S19@
3 PAGE National Archives and Records Administration (NARA); Washington, D.C.;
4 CONC World War II Draft Cards (Fourth Registration) for the State of
4 CONC Wisconsin; State Headquarters: Wisconsin; Record Group Name: Records
4 CONC of the Selective Service System, 1940-; Record Group
0 TRLR
I think the ID should just be S19.
Here's my code:
<?php
spl_autoload_register(function ($class) {
$pathToPhpGedcom = __DIR__ . '/library/'; // TODO FIXME
if (!substr(ltrim($class, '\\'), 0, 7) == 'PhpGedcom\\') {
return;
}
$class = str_replace('\\', DIRECTORY_SEPARATOR, $class) . '.php';
if (file_exists($pathToPhpGedcom . $class)) {
require_once($pathToPhpGedcom . $class);
}
});
$parser = new \PhpGedcom\Parser();
$gedcom = $parser->parse('family.ged');
foreach ($gedcom->getIndi() as $individual) {
print $individual->getId() . ': ' . current($individual->getName())->getName() . "<br>\n";
foreach($individual->getAttr() as $attr){
print $attr->getType() . ": <br>\n";
foreach($attr->getSour() as $sour){
print "The Source ID IS: <br>\n";
print $sour->getSour() . "<br>\n";
}
}
}
The output of the code above is:
I212: Bob The /Frog/
RESI:
The Source ID IS:
@S19@World War II Draft Cards (Fourth Registration) for the State ofWisconsin; State Headquarters: Wisconsin; Record Group Name: Recordsof the Selective Service System, 1940-; Record Group
I think there are actually three issues here.
- Biggest problem: The GEDCOM doesn't match the spec.
This says that source citations with a pointer to the source record shouldn't have CONC tags in them. This says that the source PAGE tag should be at most 248 characters. It looks like Family Tree Maker decided to break both of those rules.
-
It looks like sources with pointers should be parsed slightly differently than sources without pointers. It appears that SOUR->CONC is only valid in sources without a pointer to a source record.
-
Since the CONT tags were level 4, they probably shouldn't have applied to the SOUR tag on level 2 even if they weren't valid for the PAGE tag at level 3.
A similar problem. The GED file from the myheritage site is not parsed. A lot of mistakes. A lot of unsupported tags (FILE for example).
No one has found a solution for parsing myheritage files?