krippendorff-alpha icon indicating copy to clipboard operation
krippendorff-alpha copied to clipboard

Distance function for ordinal data

Open lrebscher opened this issue 7 years ago • 11 comments

Hello,

I want to use krippendorff's alpha for ordinal data (6-point Likert scale), but no ordinal distance function is available.

Ordinal distance formula (https://en.wikipedia.org/wiki/Krippendorff%27s_alpha): bildschirmfoto 2017-09-21 um 17 43 36

Has a support for ordinal data been considered yet?

Cheers

lrebscher avatar Sep 21 '17 15:09 lrebscher

@lrebscher I have build a fast version based on this library that contains the ordinal metric. Maybe you can give it a try: https://github.com/pln-fing-udelar/fast-krippendorff

btw, @grrrr thanks for this library!

bryant1410 avatar Sep 28 '17 03:09 bryant1410

I built it from scratch because this one was too slow for my dataset (it took several hours to compute with 40k units, few annotations per unit)

bryant1410 avatar Sep 28 '17 03:09 bryant1410

Hi Santiago, that's great, will have a look!

Am 28.09.2017 um 05:54 schrieb Santiago Castro [email protected]:

I built it from scratch because this one was too slow for my dataset (it took several hours to compute with 40k units, few annotations per unit)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/grrrr/krippendorff-alpha/issues/5#issuecomment-332720616, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJ-Jp0Xd2G828zb8KIxXtxENzOsurQTks5smxhogaJpZM4Pfirx.

grrrr avatar Sep 28 '17 07:09 grrrr

@bryant1410 thank you! I will check it out!

lrebscher avatar Sep 28 '17 07:09 lrebscher

@grrrr I tested your implementation in https://github.com/pln-fing-udelar/humor/tree/32a7ce954361fb79a58e4e4f282b88a02b4fcda1 To reproduce it:

pip install -r requirements.txt
./agreement.py

Current is in https://github.com/pln-fing-udelar/humor/tree/2ee918293fa99038247f1d848d40bdade471ff0c

I didn't analyze the main bottleneck, but my implementation has some features:

  • It computes the coincidence matrix, as the Wikipedia page says to be more computationally efficient.
  • It avoids some expensive for loops, trading them by Numpy's tensor multiplications. This is often at least one order of magnitude faster.
  • It avoids using try-except blocks inside for-loops, which may be inefficient if it catches an exception very often, as there's a change of context.
  • Few helper data structures are used. So it avoids frequent allocations and accesses.
  • It uses Numpy wherever possible.

bryant1410 avatar Sep 28 '17 13:09 bryant1410

@bryant1410 Is it possible to compute krippendorff's alpha for nominal data with the Euclidean distance function? I am wondering what the distance function is in your fast version.

SaraAmd avatar Jan 04 '23 23:01 SaraAmd

Hey. Now there are distance functions for nominal, ordinal, interval, and ratio data types. Though not sure how you could use Euclidean distance for nominal data.

bryant1410 avatar Jan 05 '23 07:01 bryant1410

@bryant1410 Thank you for the response. what are the current distance functions for nominal? how can I see the available distance functions?

SaraAmd avatar Jan 05 '23 15:01 SaraAmd

See https://github.com/pln-fing-udelar/fast-krippendorff/blob/main/krippendorff/krippendorff.py#L14-L39

The distance metric for nominal pretty much checks if the value is the same or different.

bryant1410 avatar Jan 05 '23 17:01 bryant1410

@bryant1410 Thank you. Unfortunately, it is not clear what formula has been used. I would recommend having documentation (comments) for a better understanding. Thanks anyway.

SaraAmd avatar Jan 05 '23 20:01 SaraAmd

It follows the definition. See e.g. https://en.wikipedia.org/wiki/Krippendorff%27s_alpha

bryant1410 avatar Jan 05 '23 20:01 bryant1410