cleanco icon indicating copy to clipboard operation
cleanco copied to clipboard

use ISO 20275 data from GLEIF

Open petri opened this issue 8 years ago • 8 comments

See https://www.gleif.org/en. There's a lot of data that would help improve the legal affix database of cleanco.

petri avatar Feb 07 '17 11:02 petri

The ELF Code List definitely has more abbreviations: https://www.gleif.org/en/about-lei/code-lists

I am just not sure what the equivalents are in some of the languages to US/UK. However, there may be some that have been missed which are more obvious. I will keep a note of this.

psolin avatar Jan 26 '19 02:01 psolin

I suspected someone might have done this by now, and sure enough: https://pypi.org/project/iso-20275 .

Since 2017, there now exists ISO standard 20275 ‘Financial Services – Entity Legal Forms (ELF).

petri avatar Apr 26 '20 10:04 petri

Cleanco was still built to ID entity types in strings, so I think it’s fine to move towards incorporating this package. It was only a matter of time before the data was standardized and put into a python package. Moving away from solely being US/UK based and towards an international standard is for the best for this package.

If incorporated, it would fix most of our open issues as well. I’ll look into doing this.

psolin avatar Apr 26 '20 12:04 psolin

For getting the base name without legal term affixes, the unique terms list from the ISO standard should probably be patched in here: https://github.com/psolin/cleanco/blob/master/cleanco/clean.py#L25-L29

petri avatar Apr 26 '20 15:04 petri

This could be broken into two or three different tickets;

  • one for using in base name deduction
  • one for country decuction, and
  • one for legal entity detection.

petri avatar Apr 26 '20 15:04 petri

Just to give you an idea of where this is going - I am counting 1,180 unique business entity affixes in this package to our 202. These are the classifiers (properties) that they use as well:

['alpha2', 'alpha2_2', 'country', 'creation_date', 'elf', 'jurisdiction', 'local_abbreviations', 'local_name', 'modification', 'modification_date', 'reason', 'status', 'transliterated_abbreviations', 'transliterated_name']

psolin avatar Apr 26 '20 15:04 psolin

Given we now understand more the differences between iso20275 data and cleanco termdata, it seems to me we need a decisions on data strategy. The current PR gets rid of cleanco termdata in favour of iso20275. But in hindsight it seems to me that instead, iso20275 should be used just a primary, but not exclusive source.

On the other hand, both iso20275 and clanco also need a mechanism by which users can use their own legal form data if needed. It would make sense if both packages used the same mechanisms and formats.

Thoughts?

petri avatar May 05 '20 04:05 petri

Replying to your "Thoughts?", At first I was happy, for example, Netherlands has all the forms included in cleanco. But then Japanese does not have the romanji versions (Y.K. - which termdata will have, if a pull request is accepted), only the kanji versions (有 and only the first character of 有限会社, which I don't know if it's written out like that - But in Chinese data, it's written out).

https://en.wikipedia.org/wiki/Y%C5%ABgen_gaisha

And even Dutch is incomplete; for example, "Foundation": "V44D","Netherlands","NL","","","stichting","Dutch","nl","stichting","","","2017-11-30","ACTV","","",""

Looking it up it seems that "st." is the official one and fdn (and lesser: fndn. or fou.) Although in practice the word is written out full, because hey, you want to state clearly you are a foundation.

Thus, in my conclusion, there is still not a good list and I join @petri that maybe both lists need to be eligible. Or at least that we can merge the differences into a new version of iso20275 including many missed data that termdata does have, and then we can use that as a master list.

In practice it means we need to fix the bug where custom_basename() is unusable in it's current state and let users add their settings in an easy way, without jumping through hoops.

FBnil avatar Aug 16 '22 20:08 FBnil