sherlock-project
sherlock-project copied to clipboard
List of semantic data types detected by Sherlock
Hi! I'am building public registry of semantic data types similar to PRONOM for data formats.
Is there any document or code file with list of semantic data types supported by Sherlock ? I would like to include it into the registry and probably this registry will be helpful for your project too.
That looks useful, thanks for making and sharing! The 78 semantic types that Sherlock is trained on can be found in table 19 on page 28 in this paper.
It also includes a potentially helpful mapping between semantic type and "feature" type like categorical, numeric, etc.
@madelonhulsebos thanks a lot! It's very helpful!
You're welcome! PS, all types match a type in wikidata.
@madelonhulsebos Great! In the registry also most types matched with Wikidata, but not all of them. There are some data types like certain types of hash MD5, SHA1, SHA256 and e.t.c. without identical Wikidata property also there are a lot of identifiers not presented in Wikidata. That's why I started this registry. All semantic data types also linked with country and spoken language.