memories icon indicating copy to clipboard operation
memories copied to clipboard

[Feature Request] read face tags from other software (digikam)

Open rhatguy opened this issue 1 year ago • 33 comments

I have a large photo set currently already tagged with digikam with the face data stored in the exif information within the photo. It would be really great if memories could just leverage the existing data (readable through exiftool) instead of relying on recognize and having to retag everything. Other photo software such as pigallery2 is able to leverage these tags. Example exiftool output:

#exiftool 20220307_215534.jpg | egrep -i person1 Categories : <Categories><Category Assigned="0">People<Category Assigned="1">person1</Category><Category Assigned="1">Ignored</Category></Category><Category Assigned="1">person2</Category></Categories> Tags List : People/Ignored, People/person1 Kelley, person2 Last Keyword XMP : People/Ignored, People/person1, person2 Hierarchical Subject : People|Ignored, People|person2, person2 Catalog Sets : People|Ignored, People|person1, person2 Subject : Ignored, person1, person2 Region Name : Ignored, person2, person1 Region Person Display Name : Ignored, person2, person1 Keywords : Ignored, person1, person2

#exiftool 20220307_215534.jpg | egrep -i region Region Applied To Dimensions W : 6528 Region Applied To Dimensions H : 4896 Region Applied To Dimensions Unit: pixel Region Name : Ignored, person2, person1 Region Type : Face, Face, Face Region Area X : 0.2019, 0.8125, 0.5144 Region Area Y : 0.608967, 0.625, 0.464767 Region Area W : 0.0431985, 0.273897, 0.3125 Region Area H : 0.0639297, 0.609069, 0.339665 Region Area Unit : normalized, normalized, normalized Region Person Display Name : Ignored, person2, person1 Region Rectangle : 0.1803, 0.577002, 0.0431985, 0.0639297, 0.675551, 0.320466, 0.273897, 0.609069, 0.35815, 0.294935, 0.3125, 0.339665

rhatguy avatar Oct 27 '22 12:10 rhatguy

The more I use recognize the less pleased I am with it. Its clustering people who are clearly not the same necessitating me to manually break those clusters apart into the appropriate people. The more I do that, the more it irks me given that I already have all of that face data stored in the EXIF info of my pictures (tagged mostly with Picasa and digikam through the years).

Now that multiple detection backends are being looked at, would there be a possibility to have some form of crawler that goes through all pictures, pulls out existing face tags, and inserts those into the Nextcloud DB into existing data structures that recognize uses today? I'm not qualified to implement that via the web, but I might be able to do some very hackish scripting. Would something like this be of any interest? Any reason I shouldn't do that?

rhatguy avatar Nov 08 '22 20:11 rhatguy

recognize

rhatguy avatar Nov 08 '22 20:11 rhatguy

The more I use recognize the less pleased I am with it. Its clustering people who are clearly not the same necessitating me to manually break those clusters apart into the appropriate people. The more I do that, the more it irks me given that I already have all of that face data stored in the EXIF info of my pictures (tagged mostly with Picasa and digikam through the years).

You might first want to report this to recognize, maybe with some sample images to see if they can figure this out.

Now that multiple detection backends are being looked at, would there be a possibility to have some form of crawler that goes through all pictures, pulls out existing face tags, and inserts those into the Nextcloud DB into existing data structures that recognize uses today? I'm not qualified to implement that via the web, but I might be able to do some very hackish scripting. Would something like this be of any interest? Any reason I shouldn't do that?

If the "script" you write is in PHP then it isn't hacky ;) I'm open to merging any patches for this that are reasonably generalized. I'm just concerned where you insert these entries. You cannot just put them directly into the recognize table because it needs the vector embedding for each face detection. Maybe there's some way to get this first, and then match with the stored face only to adjust the cluster?

Btw you might also want to look at facerecognition. Hopefully we'll have integration with memories soon.

pulsejet avatar Nov 09 '22 00:11 pulsejet

Hi. I came here to submit a feature request identical to this one. I am also a digikam user, and I spent countless hours tagging and face recognizing my whole picture library (I'm about about ~150k files), and I have several thousand keywords and face tags perfectly classified. One of the reasons I use DigiKam is because it works with standards, making sure everything is compatible with everything else, so I feel every time an app tries to start from scratch without reading the existing information on the files it's a bit of a waste of time. Specially considering exiftool is already being used to read the date for each file, so why not the keywords and the faces too?

I will be willing to provide examples of tagged pictures and also beta-test that feature if you want.

If you want an example of a picture library that is compatible with the existing metadata, I'd suggest you take a look at Pigallery2 https://github.com/bpatrik/pigallery2. It doesn't support hierarchical tags (that would be ideal), but is able to read all the metadata from the files and make it searchable. Also, the whole thing is very quick and lightweight.

wonx avatar Feb 14 '23 19:02 wonx

If you want an example of a usercase of hierarchical keywords, here's an example I submitted to another github project: https://github.com/Webreaper/Damselfly/issues/335 (also https://github.com/photoprism/photoprism/issues/1779)

wonx avatar Feb 14 '23 20:02 wonx

@wonx I tend to agree with your previous comment. Memories is already importing tons of metadata like GPS coordinates, timestamps, even camera make/model information, and storing it in the oc_memories table. Seems like pulling in the face recognition data also and parsing it also would be the most integrated way to go about it. That being said, today memories is reading the face information that separate apps (Recognize and/or Facerecognition) are creating and storing in other tables in the Nextcloud database (oc_facerecog_faces in Facerecognition's case). I spent a bit of time trying to figure out if I could create a script to use exiftool to pull the existing face data and manually populate it into one of those tables, but have been unsuccessful (so far).

It seems like we (really @pulsejet ) need to decide if its worthwhile to use memories existing tables to store that data, and let it be essentially another way to gather face data like Recognize and Facerecognition are today, or are we going to need to reverse engineer manually inserting the data into one of those existing structures used by Recognize or Facerecognition (which seems messy). I'm sure @pulsejet was quickly accept pull requests so its just a matter of someone devoting the time to build this one way or another. @pulsejet not throwing this on you...thanks for all your super effort on this app. If in my limited coding experience I can figure out how to populate one of these existing tables, I'll gladly contribute that back.

I strongly agree with @wonx sentiment that the number of apps creating their own face metadata with no easy way to import existing metadata or export the data they create, seems like a missed opportunity for standardization/collaboration to me.

rhatguy avatar Feb 16 '23 17:02 rhatguy

Pretty much nailed it @rhatguy

Besides the lack of standardization, there's one more issue: how to deal with multiple of them. For example, do we just show exif face tags similar to recognize? Then do we show both if both are enabled? If yes, should they be together? How to deal with further recognition after the import?

Many other questions like these.

pulsejet avatar Feb 16 '23 22:02 pulsejet

Can you expand a bit on the lack of standardization? It seems like the implementation that Digikam, exiftool, PiGallery2, and others use is relatively a standard at this point at least in the realm of tools that store the faces in photo metadata. Even tools that don't create metadata like MyLio and windows explorer can at least read those tags. What would constitute a standard in your mind?

I thought I remembered previously that in Memories today there are two sidebar items if you have both recognize and face recognition enabled. Would exif faces not be handled in a similar way?

rhatguy avatar Feb 17 '23 12:02 rhatguy

I'm not really familiar with how standards work in this area, but for starters commercial vendors don't make any attempt to follow any of these to store their metadata. Regardless, no need to really worry about that; doing something similar to Digikam would be most appropriate either way.

Would exif faces not be handled in a similar way?

From my view, the best flow would be to show the tagged and AI recognized images together, maybe with some visual clue. That way, you can "confirm" AI tagged something correctly then it goes into the actual image file.

pulsejet avatar Feb 20 '23 16:02 pulsejet

Would exif faces not be handled in a similar way?

Just a quick observation, faces are not stored in EXIF, but in XMP metadata. Exiftool (and exiv2) can read both anyway.

Basically metadata can be stored in three spaces: EXIF, IPTC and XMP. Most metadata can be stored in all of them, but they have different features. For instance, hierarchical keywords and face regions can only be stored in XMP, as far as I know, but regular keywords can be stored in all of them. Usually these programs read all three metadata spaces and combine the info. Also, different image editing software write their own XMP snippets (see below). Fortunately, for storing faces, the Adobe xmp format seems to be the the facto standard and most, if not all, programs use it.

In case of discrepancies (e.g. different dates in EXIF and XMP) the software has to decide which one has priority. Also, some programs choose to list the faces names in the list of keywords (digikam does this), but others like to keep them in their own section (e.g. Picasa). It's also worth to mention that XMP data can be also stored in sidecar files (.xmp) instead of inside the picture itself.

For instance, this is the XMP section of the metadata (exiftool -xmp -b <picture.jpg>) of a sample picture containing some keywords (Llocs/EEUU/New York/New York City/Manhattan/Central Park) and to face regions (Person1 and Person2), and shows how different picture management software write their metadata. Notice how there's a section for Digikam, MicrosoftPhoto, Lightroom, Media Pro, and ACDsee:

<?xpacket begin="" id="W5M0MpCehiHzreSzNTczkc9d"?>
<x:xmpmeta
	xmlns:x="adobe:ns:meta/" x:xmptk="XMP Core 4.4.0-Exiv2">
	<rdf:RDF
		xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
		<rdf:Description rdf:about=""
			xmlns:exif="http://ns.adobe.com/exif/1.0/"
			xmlns:digiKam="http://www.digikam.org/ns/1.0/"
			xmlns:MicrosoftPhoto="http://ns.microsoft.com/photo/1.0/"
			xmlns:lr="http://ns.adobe.com/lightroom/1.0/"
			xmlns:mediapro="http://ns.iview-multimedia.com/mediapro/1.0/"
			xmlns:dc="http://purl.org/dc/elements/1.1/"
			xmlns:MP="http://ns.microsoft.com/photo/1.2/"
			xmlns:MPRI="http://ns.microsoft.com/photo/1.2/t/RegionInfo#"
			xmlns:MPReg="http://ns.microsoft.com/photo/1.2/t/Region#"
			xmlns:mwg-rs="http://www.metadataworkinggroup.com/schemas/regions/"
			xmlns:stDim="http://ns.adobe.com/xap/1.0/sType/Dimensions#"
			xmlns:stArea="http://ns.adobe.com/xmp/sType/Area#"
			xmlns:acdsee="http://ns.acdsee.com/iptc/1.0/" exif:GPSVersionID="2.0.0.0" exif:GPSMapDatum="WGS-84" exif:GPSAltitudeRef="1" exif:GPSAltitude="211341/10000" exif:GPSLatitude="40,46.52702967N" exif:GPSLongitude="73,58.33259014W" acdsee:categories="&lt;Categories&gt;&lt;Category Assigned=&quot;0&quot;&gt;Llocs&lt;Category Assigned=&quot;0&quot;&gt;EEUU&lt;Category Assigned=&quot;0&quot;&gt;New York&lt;Category Assigned=&quot;1&quot;&gt;New York City&lt;Category Assigned=&quot;0&quot;&gt;Manhattan&lt;Category Assigned=&quot;1&quot;&gt;Central Park&lt;/Category&gt;&lt;/Category&gt;&lt;/Category&gt;&lt;/Category&gt;&lt;/Category&gt;&lt;/Category&gt;&lt;Category Assigned=&quot;0&quot;&gt;Persones&lt;Category Assigned=&quot;1&quot;&gt;Person1&lt;/Category&gt;&lt;Category Assigned=&quot;1&quot;&gt;Person2&lt;/Category&gt;&lt;/Category&gt;&lt;/Categories&gt;">
			<digiKam:TagsList>
				<rdf:Seq>
					<rdf:li>Llocs/EEUU/New York/New York City/Manhattan/Central Park</rdf:li>
					<rdf:li>Persones/Person2</rdf:li>
					<rdf:li>Persones/Person1</rdf:li>
					<rdf:li>Llocs/EEUU/New York/New York City</rdf:li>
				</rdf:Seq>
			</digiKam:TagsList>
			<MicrosoftPhoto:LastKeywordXMP>
				<rdf:Bag>
					<rdf:li>Llocs/EEUU/New York/New York City/Manhattan/Central Park</rdf:li>
					<rdf:li>Persones/Person2</rdf:li>
					<rdf:li>Persones/Person1</rdf:li>
					<rdf:li>Llocs/EEUU/New York/New York City</rdf:li>
				</rdf:Bag>
			</MicrosoftPhoto:LastKeywordXMP>
			<lr:hierarchicalSubject>
				<rdf:Bag>
					<rdf:li>Llocs|EEUU|New York|New York City|Manhattan|Central Park</rdf:li>
					<rdf:li>Persones|Person2</rdf:li>
					<rdf:li>Persones|Person1</rdf:li>
					<rdf:li>Llocs|EEUU|New York|New York City</rdf:li>
				</rdf:Bag>
			</lr:hierarchicalSubject>
			<mediapro:CatalogSets>
				<rdf:Bag>
					<rdf:li>Llocs|EEUU|New York|New York City|Manhattan|Central Park</rdf:li>
					<rdf:li>Persones|Person2</rdf:li>
					<rdf:li>Persones|Person1</rdf:li>
					<rdf:li>Llocs|EEUU|New York|New York City</rdf:li>
				</rdf:Bag>
			</mediapro:CatalogSets>
			<dc:subject>
				<rdf:Bag>
					<rdf:li>Central Park</rdf:li>
					<rdf:li>Person2</rdf:li>
					<rdf:li>Person1</rdf:li>
					<rdf:li>New York City</rdf:li>
				</rdf:Bag>
			</dc:subject>
			<MP:RegionInfo rdf:parseType="Resource">
				<MPRI:Regions>
					<rdf:Bag>
						<rdf:li MPReg:PersonDisplayName="Person1" MPReg:Rectangle="0.39901, 0.579846, 0.144245, 0.151982"/>
						<rdf:li MPReg:PersonDisplayName="Person2" MPReg:Rectangle="0.490469, 0.191905, 0.201796, 0.217511"/>
					</rdf:Bag>
				</MPRI:Regions>
			</MP:RegionInfo>
			<mwg-rs:Regions rdf:parseType="Resource">
				<mwg-rs:AppliedToDimensions stDim:w="5456" stDim:h="3632" stDim:unit="pixel"/>
				<mwg-rs:RegionList>
					<rdf:Bag>
						<rdf:li>
							<rdf:Description mwg-rs:Name="Person1" mwg-rs:Type="Face">
								<mwg-rs:Area stArea:x="0.471133" stArea:y="0.655837" stArea:w="0.144245" stArea:h="0.151982" stArea:unit="normalized"/>
							</rdf:Description>
						</rdf:li>
						<rdf:li>
							<rdf:Description mwg-rs:Name="Person2" mwg-rs:Type="Face">
								<mwg-rs:Area stArea:x="0.591367" stArea:y="0.300661" stArea:w="0.201796" stArea:h="0.217511" stArea:unit="normalized"/>
							</rdf:Description>
						</rdf:li>
					</rdf:Bag>
				</mwg-rs:RegionList>
			</mwg-rs:Regions>
		</rdf:Description>
	</rdf:RDF>
</x:xmpmeta>

wonx avatar Feb 20 '23 16:02 wonx

@wonx Wouldn't the best thing be to use the MetadataWorkingGroups defined structure? These structures seem to be fairly standardized and look similar between your example above and mine below (relevant section clipped from full exiftool output). The relevant standards doc from MWG is at the link below. https://web.archive.org/web/20180919181934/http://www.metadataworkinggroup.org/pdf/mwg_guidance.pdf

@pulsejet In response to your question about how to display these face regions, my use case would be to use memories to display the existing metadata and turn off all further face detection inside Nextcloud. I've spent years curating the metadata in these images and don't need new face detection algorithms trying to recreate the metadata I already have in my photos. I would assume that if someone has taken the time to use other software to add this metadata they aren't looking for another algorithm to validate/match it. To me, Memories is a "viewer" of metadata created by other software (facerecognition and recognize), when it comes to faces. I view the face metadata stored in photos as another data source to be treated just like GPS coordinates or creation dates. No need to do anything further with it except use/display it. I would leave any comparison/matching of the metadata to the tools that are creating new metadata. For instance, it would be awesome if facerecognition/recognize looked at this metadata as a "hint" in their processing. If they think a face is person1, but the metadata in the photo says its person2, there would be an indication of likely a bad match to me.

		<mwg-rs:Regions rdf:parseType="Resource">
			<mwg-rs:AppliedToDimensions stDim:w="4608" stDim:h="3456" stDim:unit="pixel"/>
			<mwg-rs:RegionList>
				<rdf:Bag>
					<rdf:li>
						<rdf:Description mwg-rs:Name="Person1" mwg-rs:Type="Face">
							<mwg-rs:Area stArea:x="0.310113" stArea:y="0.525608" stArea:w="0.125" stArea:h="0.192419" stArea:unit="normalized"/>
						</rdf:Description>
					</rdf:li>
					<rdf:li>
						<rdf:Description mwg-rs:Name="Person2" mwg-rs:Type="Face">
							<mwg-rs:Area stArea:x="0.610569" stArea:y="0.490451" stArea:w="0.100911" stArea:h="0.179398" stArea:unit="normalized"/>
						</rdf:Description>
					</rdf:li>
					<rdf:li>
						<rdf:Description mwg-rs:Name="Person3" mwg-rs:Type="Face">
							<mwg-rs:Area stArea:x="0.127387" stArea:y="0.387731" stArea:w="0.120226" stArea:h="0.179398" stArea:unit="normalized"/>
						</rdf:Description>
					</rdf:li>
					<rdf:li>
						<rdf:Description mwg-rs:Name="Person4" mwg-rs:Type="Face">
							<mwg-rs:Area stArea:x="0.838976" stArea:y="0.310909" stArea:w="0.182726" stArea:h="0.301215" stArea:unit="normalized"/>
						</rdf:Description>
					</rdf:li>
				</rdf:Bag>
			</mwg-rs:RegionList>
		</mwg-rs:Regions>

rhatguy avatar Feb 26 '23 20:02 rhatguy

I strongly endorse the idea of importing existing face tags.

I also have a large number of photos (>100K) that have been face-tagged over time with various tools (Picassa, Lightroom, Digikam). I currently use Digikam as my sole tool for face tagging.

It is my understanding that Picassa, Lightroom and Digikam all use MWG regions. Lightroom also adds face names in the keywords for each photo (and requires them, in order to use those regions), but these can be ignored for our purposes. Some vendors (i.e., ACDSEE) create proprietary tags. I'd suggest initially focussing on MWG as it is the closest thing to a standard and the above tools likely cover a large percentage of users.

A while ago I wrote code in Python to read the MWG regions (mostly using existing open-source libraries). There are some complexities, since the face regions coordinates are relative to the non-rotated image. So if the image has rotation through exif settings, you need to account for that. Overall, it's not too difficult though. As a test, I ran it on my library and clipped out the face regions to folders for each name and it worked well. I'd be happy to provide this to anyone that is interested, but I'm not sure how useful it would be.

I haven't done any development for Nextcloud and I'm not familiar with PHP, so I'm not sure how much I could contribute. But if there is anything I could do I'd be willing to help.

ericnotthered avatar Mar 01 '23 20:03 ericnotthered

@pulsejet looking at this a bit deeper and exploring the recognize tables in mysql (oc_recognize_face_detections and oc_recognize_face_clusters seem to be the most relevant), can you point me at any documentation for what the "vector" for a face is? The width, height, X, and Y inside oc_recognize_face_detections seem pretty straightforward to extract using exiftool, and seem to be the only details that would be required to draw a box inside memories. Wondering how hard it would be to load those two tables with data using a script that could be called from occ much like the recently added google takeout import.

rhatguy avatar Mar 17 '23 21:03 rhatguy

I'm not too familiar with the internals of recognize, but vector must be the embedding of the face that's used for clustering. It should be possible to load the rest from exiftool (guessing), but we can't just load it into the recognize table without the vector. One solution might be to create yet another table (lot of work).

pulsejet avatar Mar 17 '23 23:03 pulsejet

The vector is used by recognize to cluster photos? Does memories use the vector at all? If not I'm not sure why we couldn't manually load the recognize tables and just leave the vector blank. We would know the name assigned to each face region from the metadata and could populate both tables referenced above. Basically we'd assign a face id in oc_recognize_face_clusters for each unique name we find in metadata then load the x,y, width, height, and face id into oc_recognize_face_detections? "Clustering" would be easy since we don't need to look at the picture itself, we already know exactly which faces are "clusters" because they have the same name in the pictures metadata. I would assume in this case that we would turn off recognize since we'd be using it's tables.

rhatguy avatar Mar 18 '23 01:03 rhatguy

The vector is used by recognize to cluster photos?

Yes.

Does memories use the vector at all?

No.

If not I'm not sure why we couldn't manually load the recognize tables and just leave the vector blank. We would know the name assigned to each face region from the metadata and could populate both tables referenced above. Basically we'd assign a face id in oc_recognize_face_clusters for each unique name we find in metadata then load the x,y, width, height, and face id into oc_recognize_face_detections?

Yes, but that'll probably break recognize forever. I understand this might be okay for you but probably not everyone.

"Clustering" would be easy since we don't need to look at the picture itself, we already know exactly which faces are "clusters" because they have the same name in the pictures metadata. I would assume in this case that we would turn off recognize since we'd be using it's tables.

So this solution would work in general if you never plan to use recognize. But it's far from ideal for many reasons:

  1. You need to install recognize first, then disable it; otherwise these tables don't exist.
  2. You need separate checks to see if you're using externally loaded metadata in recognize table, to enable the menu items etc.
  3. If you (even accidentally) reinstall or update recognize for whatever reason, you risk breaking everything in case the schema changes or something like that.

Bottom line is, we don't touch tables from other apps as far as possible, since they're maintained by someone else. Memories is already uncomfortably dependent on the schema from too many different apps (this is a requirement for performance reasons), but updating a table we don't own is a no-no.

pulsejet avatar Mar 18 '23 18:03 pulsejet

@pulsejet what is your ideal way to implement this? You mentioned that creating another table (and im guessing another sidebar entry for faces from photo metadata) being a lot of work. But if we dont either create another table or load the data into recognize or facerecognitions existing tables the only other way i can see to get access to the data would be to pull it into the exif column of oc_memories. Seems like a lot of other metadata such as gps coordinates and creation dates are already stored in that table so perhaps pulling in faces is just another piece of metadata to be imported into that table structure? I still feel like even if the metadata were stored in the oc_memories table another table like the oc_recognize_face_detections table where there is one row per face would be beneficial for quickly determining the number of occurances of a face across pictures or quickly finding the coordinates of a face within a picture.

In one of your previous comments you mentioned a lot of questions about how to display the faces from photo metadata alonside faces from the two ai detection methods. Given the model for having faces from multiple data sources displayed inside memories seems to have already been determined, wouldnt picture metadata be displayed in a similar manner, under its own sidebar entry? You also asked about further recognition after import, wouldnt that be treated the same way as if someone added GPS coordinates to a photo with an external program where the would be imported on photo upload to nextcloud or at worst case through a re-index? I feel like these questions become easier to answer if face metadata is treated/displayed like other photo metadata instead of being generated/modified as with the AI models.

rhatguy avatar Mar 19 '23 21:03 rhatguy

This strikes me as more of a change to Recognize than to Memories. It doesn't make sense to duplicate the face data structure in Memories.

A possible workflow within Recognize would be:

  • Turn off face recognition.
  • Delete existing face regions
  • Load the XMP face regions for all of the photos - either into the Recognize face db or a temporary structure
  • Invoke training for each individual using the known face regions.
  • Update the Recognize db for the individual.

This may be simplistic, as I do not know how Recognize works internally.

I have created this feature requires on the Recognize Github: https://github.com/nextcloud/recognize/issues/761

ericnotthered avatar Mar 21 '23 21:03 ericnotthered

Not that my opinion matters a lot either way, but I would respectfully disagree. I view Recognize as a "creator" of metadata. It scans pictures and makes associations between faces where no metadata about those pictures/faces existed before. I view Memories as a "viewer" of metadata. To my knowledge, Memories doesn't actually create any metadata of its own. At its core, Memories reads metadata from the pictures (GPS coordinates, creation dates, title, description...etc) OR reads metadata created by another Nextcloud app (Recognize or Facerecognition) and uses that metadata to display pictures in a meaningful manner (on a map, in a timeline, grouped by face). I grant you that Memories does technically allow you to edit metadata now which could also be viewed as creating, but I feel like thats not Memories main focus yet at least.

I don't disagree it would be great if Recognize could use face metadata that in some cases already exists to better do its core job, but as @pulsejet has mentioned in this thread, trying to force metadata into Recognize that it didn't create has the potential to break it (maybe permanently). Again, my opinion is worth what you paid for it, just offering a viewpoint.

rhatguy avatar Mar 22 '23 02:03 rhatguy

I see your perspective, but I see it a bit differently. In it's current form, Recognize is the provider of face region information, rather than just a creator. If Recognize wrote this data back to XMP in the files, then I think your rationale would be more solid. However, under the current model Memories would require two independent sources for the face regions - both the XMP regions and Recognize. This could lead to duplications, etc.

Implementing XMP support in Recognize and Memories isn't mutually exclusive though. In the short term, I would suggest that supporting an import to Recognize would provide more immediate benefit. Long term, if Recognize moves to saving (and de-duplicating) face regions as XMP, Memories could read the XMP metadata accordingly. This would also be beneficial to make this face region information platform independent.

Note that I would not advocate to "force metadata that it didn't create" into Recognize. I think that concern is predicated on the assumption that the data would be pushed into the Recognize database by an external actor without changing Recognize.

The approach I'm advocating would be to update Recognize to import and properly process XMP data and use this within it's workflow. I would expect that Recognize likely follows the usual face recognition workflow where it is 1) trained with known face regions, then 2) finds face regions in images, and 3) runs recognition on the discovered face regions. The XMP regions would facilitate the training since they are known face regions, and could also be saved in Recognize with the resulting face vector generated by it's training.

It would make sense to de-duplicate any regions in a given image - either for faces found in multiple regions within the same image; or where Recognize identifies a different person for a face region it identifies that has a high intersection with an existing XMP region.

ericnotthered avatar Mar 22 '23 16:03 ericnotthered

Let me start with a note that this discussion is mostly academic right now. There are many other pressing issues that need addressing before this, unless someone is willing to work on it. Regardless,

  1. We can't really put face metadata in the oc_memories table; that'll make it impossible to query. Regarding the current implementation for recognize/face recognition, I'm sure that's not the best way to implement this in the first place. For a start, there is a lot of code duplication between the two. So supporting multiple backends in general needs a lot more thought.
  2. Ideally, I agree with @ericnotthered. If we could just get the existing XMP data into recognize, it could probably do more than just display it, e.g. use it for improving future AI recognition. However, pragmatically I don't expect this to ever happen. Nextcloud Photos doesn't support EXIF data as of writing this; Recognize doing it seems like a distant possibility.
  3. Maybe what is needed is some high level abstraction to join and de-duplicate (automatically if possible) results from multiple backends. Then XMP metadata can simply be one of these backends. Like I said, this needs a lot of thought.

pulsejet avatar Mar 22 '23 17:03 pulsejet

@pulsejet, it is unfortunate if - as you say - Recognize is not likely to work with XMP data (reading or writing, I would assume). In this event, I am unlikely to proceed with using Recognize as a tool.

I would then likely advocate to proceed with support to read XMP face regions within Memories. I can continue to use an external tool such as Digikam to perform the face recognition feature and update the XMP data within the files.

My suggestion would be to focus on XMP (and likely MWG tags) as the initial (or only) backend. This allows people to use a number of different tools to manage the data, and it keeps the information in the files for portability.

As an aside: You have done an amazing job with Memories already. I've looked at many options (including Photoprism) and I find this the most promising I like the fact that you have focused on the viewing experience, as there are many other tools that can be used to edit or update the media.

The main feature I would like to see - other than XMP face region support - is filtering/searching options.

ericnotthered avatar Mar 22 '23 18:03 ericnotthered

As I said earlier, in my mind the ideal case is to import pre existing tags, then allow the AI to assist in further tagging. So everything goes into the file, but we don't need to do it manually.

pulsejet avatar Mar 22 '23 20:03 pulsejet

Any existing information in the metadata should be used by the app. I definitely vote for this feature request! Thank you very much!

SpamReceiver avatar Mar 24 '23 11:03 SpamReceiver

And please keep "Recognize" and "Memories" separate apps. "Memories" is supposed to show the photos, and should make use of available metadata, including face regions, geotags, etc. of course. "Recognize" scans photos and might add metadata.

SpamReceiver avatar Mar 24 '23 11:03 SpamReceiver

@pulsejet, do you have any plans regarding importing XMP face regions at this time?

ericnotthered avatar May 08 '23 19:05 ericnotthered

A somewhat philosophical question has come to my mind about this lately. Why is face metadata different than GPS/location metadata, creation dates, keywords, or title? There seems to be a "standard" that exiftool, digikam, mylio, adobe, pigallery2, etc, all follow. If the way exiftool writes face tags today is not "standard" what would make it standard?

I ask because I've recently tried mylio and it is able to immediately import all of the pictures that I tagged in digikam. I can then add new face tags and view those tags with mylio, and then view those tags in digikam, or in pigallery2. I can also use exiftool to read and write the tags these programs created. All this is similar to what I expect when I edit the geolocation information in one tool or another and expect it to carry forward to other tools. Whats different about face tags that makes them different than the other metadata that various tools read/write?

rhatguy avatar May 08 '23 19:05 rhatguy

@rhatguy I haven't investigated the specifics of the format yet, but that part might be somewhat standardized. The main issue here that the implementation of this is quite laborious, and so other issues have higher priority for now. Reading the tags is easy, but writing them needs a lot of thought. E.g. when to do this; probably not ASAP on assigning a tag since this might happen often, and writing metadata is a very expensive operation.

So it might be easy, for example, to add the XMP tags to the "Edit Metadata" dialog (they're no different in that sense), but that'll create a lot of confusion with these tags and Nextcloud collaborative tags.

pulsejet avatar May 09 '23 07:05 pulsejet

@pulsejet are you thinking of the need to read and write tags at the same time? I can see where creating a UI to EDIT/ADD faces would be far different and quite a change from the existing functionality since the user would potentially need to be able to draw boxes.

Note this "issue" was only opened for reading...not creating/editing. If this feature is broken into two phases (reading and then writing) I think phase 1 becomes significantly easier as you mentioned. In response to your comment about when to write after "assigning" a tag....remember that face metadata read from pictures would already have names assigned to them and wouldn't be relying on memories to add/change those names. If the initial implementation is only reading tags, the user would continue to make any edits using external software (digikam, mylio, etc) and those changes would be picked up by memories the same way an update to a location is. PiGallery2 takes a similar approach where it reads these tags, but does not support any modification to them.

rhatguy avatar May 09 '23 12:05 rhatguy

The same tags are used by Lightroom, and most editing/catalog software saving the metadatas in EXIF (or xmp), so that will be a really big plus.

Recognize in itself is just unsable for Faces.

foux avatar Jun 20 '23 11:06 foux