basex
basex copied to clipboard
Store Document Metadata
Moved from #804: see also mailing-list.
XQuery Functions
-
db:properties($db as xs:string) as element(properties)*
-
db:properties($db as xs:string, $path as xs:string) as element(properties)*
Output:
<properties name="/path/to/file.xml">
<creation-date>2003-03-03</creation-date>
<description>what a wonderful file</description>
</properties>
Alternative suggestion:
<resource raw="false" content-type="application/xml"
modified-date="2012-02-02T19:13:42.000Z" name="file.xml">
<property name="prop1">value1</property>
<property name="prop2">value2</property>
</resource>
XQuery example:
for $prop in db:properties("db")
where $prop/creation-date = '2003-03-03' and
$prop/description contains text "suitable" ftand "purpose"
return $prop/@name/string()
-
db:property($db as xs:string, $path as xs:string, $key as xs:string) as xs:string
An empty string will be returned if no value is stored for the specified key.
-
db:set-property($db as xs:string, $path as xs:string, $key as xs:string, $value as xs:string) as empty-sequence()
The property will be removed if the specified value is empty.
BaseX Commands
-
PROP GET
-
PROP GET [path]
-
PROP GET [path] [key]
-
PROP SET [path] [key] [value]
The property will be removed if the specified value is empty.
Examples:
-
PROP GET
: returns all properties of all resources -
PROP GET path/to
: returns all properties of resources in the specified path -
PROP GET "doc 1.xml"
: returns all properties of a specific resource -
PROP GET file.xml creation-date
: returns specific property
These could be relevant for this feature.
http://www.w3.org/TR/xproc20/#dt-document-properties http://www.xmlprague.cz/sessions2015/#nodesearch
Good point! Here is another link to Michael Kay's comment on the EXPath mailing list:
https://lists.w3.org/Archives/Public/public-expath/2015Feb/0007.html
Different light-bulbs went off in different people's heads ;-) --Marc
On Tue, Feb 17, 2015 at 12:22 PM, Christian Gruen [email protected] wrote:
Good point! Here is another link to Michael Kay's comment on the EXPath mailing list:
https://lists.w3.org/Archives/Public/public-expath/2015Feb/0007.html
— Reply to this email directly or view it on GitHub https://github.com/BaseXdb/basex/issues/988#issuecomment-74653284.
--Marc
While working on a simple atompub service I encountered another use case for this type of metadata.
In atompub content is always accompanied, or represented as an atom:entry which contains some metadata and some atom fields that should always be present. Currently I model this as two separate documents. Having such a metadata on top of a document node could make this a lot simpler as this atom data could be put on the document node itself. This could also provide nicer querying as the search would return the document nodes instead of the extra indirection of locating the actual document-node from the information in the atom:entry document-node.
Cheers, --Marc
On Tue, Feb 17, 2015 at 5:18 PM, Marc van Grootel < [email protected]> wrote:
Different light-bulbs went off in different people's heads ;-) --Marc
On Tue, Feb 17, 2015 at 12:22 PM, Christian Gruen < [email protected]> wrote:
Good point! Here is another link to Michael Kay's comment on the EXPath mailing list:
https://lists.w3.org/Archives/Public/public-expath/2015Feb/0007.html
— Reply to this email directly or view it on GitHub https://github.com/BaseXdb/basex/issues/988#issuecomment-74653284.
--Marc
--Marc
Vicent Lizzi has asked me in the mailing list:
If I may ask, what metadata information do you need to record about each document? I thought it would be better responding here as a user case for the requested feature. We have xml documents sent out to a partner in a specific format with a specific format. Some information that we have in our relational database is missing from these files. The information is not neccesserly a single value, but can be a list. For example, It wouldn't be wise (performance) have one property country and a value of multiple countries in csv format. A key 'Country' under 'Countries' is preffered. We don't want to modify the xml files for having a second version which is different from the published one and doesn't follow the partner scheme. An alternative would be, as suugested by Vincent: In similar situations I've used a second database to store metadata at the same path as the >document in the primary database. For example:
db:open('database', '/path/a.xml') db:open('database_metadata', '/path/a.xml') An analogy would be a catalog xml can be created per each xml document stored at Thredds server, adding extra information without changing the original xml file.
--Menashè
Having finally read through the thread on the EXPath maling list mentioned above, I think the proposed resource collections EXPath module would be useful though perhaps there is a need for two modules.
- A module that makes available properties of a document that may be provided by an implementation (e.g. BaseX). This module would have a set of standard properties (such as created, last-modified, resource-uri, size, etc.) which the implementation can provide depending on what is possible for a given document.
- A module that provides a way for an application or user to attach properties (key-value pairs) to a document. This module would have functions such as prop:set($path, $key, $value) and prop:get($path, $key). The $key could be a string restricted to valid QNames.
A requirement of both modules is the ability to use properties in filters to select documents.
Another requirement of both modules would be a method to list the properties that are available for a particular document.
Perhaps both modules would have a function to return all properties as a map, and a function to return all properties as a sequence of elements.
On the other hand, perhaps one module could do both things. This module would need to have a list of reserved property names that cannot be used for properties defined by an application/user. The properties feature in QuizX does this and provides both system properties and user properties.
A few more standard" properties could be to store the original XML declaration and DOCTYPE declaration when an XML file is added to a database. It's probably more useful to have the components individually (instead of as one string), so the property names might be something like:
- doctype-root-element
- doctype-public-identifier
- doctype-system-uri
- doctype-internal-subset
- xml-encoding
- xml-version
Each of these properties would be populated with a string value if the value is detected when an XML is added to a BaseX database.
I would strongly prefer XML properties to properties restricted to key-value pairs. (Consider "author" metadata; there can easily be more than one, and application needs may run from detailed (is this a primary author? principle author? contributing author?) to "list of names" to "copyright holder".)
My preference would be for arbitrary XML properties documents in a parallel, optional, properties collection, with a DB-level mechanism to manage a one-to-maybe mapping between content documents and properties documents. (That is, for each content document in the content collection there may be zero or one properties documents in the properties collection, the DB keeps track of the association, and regular XQuery operations can't manipulate this association.)
I don't think it's possible to produce a really general set of default properties; use cases for effectively arbitrary metadata requirements are easy to find. (Consider a need to store business process metadata outside the specific document to which that business process metadata applies.)
It should be possible to generate properties documents and associate them with content documents externally to BaseX and then load the content with the properties from an external source when creating a DB. If that's not possible, setting properties for a large collection gets impractical.
Properties documents should import and export with the content documents. (Or at least there should be a switch for this in db:export and db:create)
I'd like to see CRUD functionality for properties documents:
db:properites-create($document-uri,$properties-document,$schema-document) as xs:string (this is going to associate two documents via a DB mechanism and optionally associate a schema document with the properties document. Return type would be xs:string with an error code) db:properties-retrieve($document-uri) as document-node() (it gives me the properties document associated with the document-uri I provided) db:properties-update($document-uri,$properties-document,$schema-document) (is updating the schema associated with a properties document allowed? certainly I can update the properties document, but not if the document-uri value is pointing to a document with a pending operation.) db:properties-destroy($document-uri,$properties-document) (straight up remove the properties document and destroy the DB-level association)
It might be the case that we would want $system-unique-identifier rather than $document-uri to identify the content document in the above.
Any manipulation of specific properties is up to the user. I can pretty easily imagine some BaseX extension functions to help with things like getting a value to use for "modified time" properties.
Where the schemas go and how and if the properties validate against them could be hard questions.
we want to set custom metadata to a document and fetch metadata in our java programming. we have similar requirement.
db:property($db as xs:string, $path as xs:string, $key as xs:string) as xs:string db:set-property($db as xs:string, $path as xs:string, $key as xs:string, $value as xs:string) as empty-sequence()
How I can achieve this requirement using basex java programming. Currently Basex provide such feature? Thanks, Srikumar
Discarded (out of scope).