TextAnalysis.jl icon indicating copy to clipboard operation
TextAnalysis.jl copied to clipboard

allow DocumentMetadata to hold arbirtary data

Open tanmaykm opened this issue 6 years ago • 3 comments

This introduces a new custom field in DocumentMetadata that is set to nothing by default, but can be used by user code to store arbirtary metadata against the document for use later. Having a pre-determined place to store such data would simplify processing in may cases.

tanmaykm avatar May 21 '19 07:05 tanmaykm

A more flexible approach would be to allow documents to hold any metadata (arbitrarily complex) and provide the mechanism for converting custom metadata to the 'standardized' DocumentMetadata https://zgornel.github.io/StringAnalysis.jl/dev/doc_extensions/

Just for the record, there was a pull request extending DocumentMetadata with a few fields a while ago that went stale for months on end.

zgornel avatar May 23 '19 12:05 zgornel

Yes, having an abstract metadata type seems like a better idea. The API changes may be more intrusive though? Stemming depends on the language stored in metadata, that needs to be abstracted out. And there are a bunch of APIs in metadata.jl. Is there anything else?

tanmaykm avatar May 23 '19 14:05 tanmaykm

Rebased to resolve conflicts.

Probably this change will be sufficient for now? While we can continue discussing about a more appropriate metadata representation for the future.

tanmaykm avatar May 24 '19 06:05 tanmaykm

Hi @tanmaykm and @zgornel , I'm trying to refresh TextAnalysis last month. I find this PR useful, but made some changes to keep it API compatible. If there are no objections, I'd like to merge it.

rssdev10 avatar Oct 25 '23 16:10 rssdev10