openWordnet-PT icon indicating copy to clipboard operation
openWordnet-PT copied to clipboard

topology of PWN

Open vcvpaiva opened this issue 8 years ago • 3 comments

Could anyone tell me how many of the 117659 synsets have glosses? ~~not all do~~

Can we add to the repo somewhere the corpus of glosses, inspectable? https://wordnet.princeton.edu/glosstag.shtml

vcvpaiva avatar Feb 06 '18 17:02 vcvpaiva

@vcvpaiva AFAIK all PWN synsets have glosses. For example, if we use the Prolog output of PWN, and removing the duplicates in the Prolog generated we have 117659 entries.

$ cd prolog
$ cat wn_g.pl | awk -F, '{print $1}'| sort | uniq -c | wc
117659

Also, if we look at the tagged glosses, it seems that all of them have tagged glosses too:

$ cd glosstag/standoff
$ cat index.byid.tab | awk -F'$\t' '{print $1'} | sort | uniq | wc
117659

fcbr avatar Feb 07 '18 11:02 fcbr

thanks @fcbr! this is odd, as I am sure many times I have had the impression not having a gloss. maybe it's when it's a single word like

05893261-n sine_qua_non, essential_condition | sine qua non (a prerequisite)

what is a tagged gloss, please?

and questions on the topology of PWN:

  1. how many synsets s go directly to Entity? do all synsets go to Entity?

  2. how many have two hops?

  3. how many have a long hierarchy like kitty<domestic_cat <cat<feline<carnivore< placental_mammal < mammal<vertebrate<chordate<animal<organism<living_thing<entity?

  4. I seem to remember that you were calculating isolated nodes vs hierarchies? where is that data now?

vcvpaiva avatar Feb 07 '18 15:02 vcvpaiva

Yes, we do have glosses with only 1-2 words. The tagged gloss corpus is not complete; not all glosses are entirely tagged, I talked with Christiane Fellbaum about it. Actually, this is an excellent work still waiting to be done.

corpus of glosses = tagged corpus

arademaker avatar Feb 07 '18 17:02 arademaker