sherlock-project icon indicating copy to clipboard operation
sherlock-project copied to clipboard

List of semantic data types detected by Sherlock

Open ivbeg opened this issue 2 years ago • 4 comments

Hi! I'am building public registry of semantic data types similar to PRONOM for data formats.

Is there any document or code file with list of semantic data types supported by Sherlock ? I would like to include it into the registry and probably this registry will be helpful for your project too.

ivbeg avatar Jun 24 '22 08:06 ivbeg

That looks useful, thanks for making and sharing! The 78 semantic types that Sherlock is trained on can be found in table 19 on page 28 in this paper.

It also includes a potentially helpful mapping between semantic type and "feature" type like categorical, numeric, etc.

madelonhulsebos avatar Jun 24 '22 08:06 madelonhulsebos

@madelonhulsebos thanks a lot! It's very helpful!

ivbeg avatar Jun 24 '22 08:06 ivbeg

You're welcome! PS, all types match a type in wikidata.

madelonhulsebos avatar Jun 24 '22 08:06 madelonhulsebos

@madelonhulsebos Great! In the registry also most types matched with Wikidata, but not all of them. There are some data types like certain types of hash MD5, SHA1, SHA256 and e.t.c. without identical Wikidata property also there are a lot of identifiers not presented in Wikidata. That's why I started this registry. All semantic data types also linked with country and spoken language.

ivbeg avatar Jun 24 '22 09:06 ivbeg