Petri Savolainen

Results 75 comments of Petri Savolainen

This was not included in 2.0 PR code. Reopening.

The `pycountry` package might also be useful.

I just pushed a `iso3166map` branch with `countries.csv` semicolon-separated file that maps all `terms_by_country` country names to corresponding 2-letter iso3166 codes.

Perhaps the issue here is that there is no space between the name and the suffix? What countries are these companies based in?

I have seen the entity abbreviation being separated by a comma (more often comma + whitespace, actually). Although I'd agree that whitespace (no comma) is a more common separator. I...

https://opencorporates.com could be used for testing?

@davidheryanto it depends. What countries are those for?

I have added a companies.csv file to the tests directory, but unfortunately it seems we cannot really use bulk ascii company names for testing, since many international companies use common...

We now have improved Unicode & non - Latin script support. So better test coverage would make sense too. One option would be to use https://faker.readthedocs.io/en/master/ to generate fake test...

It is. You might also want to take a look at the https://pypi.org/project/iso-20275 package. The plan is to use data from there and just augment it when it's missing some...