registry icon indicating copy to clipboard operation
registry copied to clipboard

DEPRECATED! implement validator for company existion

Open OlegPhenomenon opened this issue 2 years ago • 3 comments

bundle exec rake company_status:check_all -- --open_data_file_path=lib/tasks/data/ettevotja_rekvisiidid__lihtandmed.csv --missing_companies_output_path=lib/tasks/data/missing_companies_in_business_registry.csv --deleted_companies_output_path=lib/tasks/data/deleted_companies_from_business_registry.csv --download_path=https://avaandmed.ariregister.rik.ee/sites/default/files/avaandmed/ettevotja_rekvisiidid__lihtandmed.csv.zip --soft_delete_enable=false

This rake task performs the following actions:

  • downloads an archive
  • unzips it
  • checks all companies from our registry to see if they are in the business registry based on the downloaded data
  • if not present, a query is made to the business registry
  • if a company has been deleted, it is saved in the file specified here at deleted_companies_output_path, if information about the company is missing, it is saved in the file specified here at missing_companies_output_path
  • we set company status and validation date to the Contact model
  • We can also decide whether to perform a soft deletion or not through a flag (needed for the first run).

Therefore, the attributes look like this:

  • open_data_file_path - specifies where the data is saved and retrieved from. Default value lib/tasks/data/ettevotja_rekvisiidid__lihtandmed.csv
  • missing_companies_output_path - specifies the path where companies not found in the business registry will be saved. Default value lib/tasks/data/missing_companies_in_business_registry.csv
  • deleted_companies_output_path - specifies the path where companies that have been removed from the registry will be saved. Default value deleted_companies_from_business_registry.csv
  • download_path - specifies where the data will be downloaded from. Default value https://avaandmed.ariregister.rik.ee/sites/default/files/avaandmed/ettevotja_rekvisiidid__lihtandmed.csv.zip
  • soft_delete - Indicates whether to run soft deletion for companies that have been removed, gone bankrupt, or are missing from the business registry. (Default value False)

Since this command already includes default values, it is not necessary to enter any parameters; they were simply added for greater flexibility. Therefore, you can run the following command: bundle exec rake company_status:check_all

and the data will be available in the directory lib/tasks/data

The job: CompanyRegisterStatusJob.perform_later(days_interval = 14, spam_time_delay = 0.2, batch_size = 100, download_open_data_file_url='https://avaandmed.ariregister.rik.ee/sites/default/files/avaandmed/ettevotja_rekvisiidid__lihtandmed.csv.zip')

This job accepts the following parameters:

  1. days_interval - selects domains that were last checked more than {days_interval} days ago.
  2. spam_time_delay - this is the time delay when querying the business registry.
  3. batch_size - the size of the batch for processing. This is needed for optimization.
  4. download_open_data_file_url - the URL from which to download the business registry data.

As indicated above, all these values have default settings, so they can be modified if necessary.

What the job does:

  • It selects companies from Estonia that were checked N days ago or companies that are in liquidation/bankruptcy/removed from the registry - or generally contain no information about having been validated (NULL value).
  • For each of these, a request is made to the registry to determine the status.
  • If the status is K/N or there is no information, we set ForceDelete if it is not already set or SoftDelete if kandeliik is Kustutamiskanne dokumentide hoidjata.
  • If the previous status was R, and the status in the business registry is R, we simply update the date of the check.
  • If a domain has ForceDelete due to the company's status, and the status is K/N, but the business registry shows status R, we cancel ForceDelete.
  • For domains in status_notes, we specify the following information Company no: {ident_number} if we set ForceDelete due to bankruptcy, company removal from the registry, or its absence.
  • If the domain status is L, we send them an email.
  • Also we use whitelist for skip some organization. Whitelist is indicated in application.yml file and it has this structure:
whitelist_companies:
  - '12345678'
  - '87654321'

POTENTIAL PROBLEM: It could happen that we decide to check a large array of data in one day, and say the next time we decide to check in a year, and logically this job might process a large list of companies exactly one year later. This should be kept in mind.

this PR related to this one #https://github.com/internetee/company_register/pull/6

related tickets: https://github.com/internetee/company_register/issues/4 https://github.com/internetee/company_register/issues/5

OlegPhenomenon avatar Jul 10 '23 12:07 OlegPhenomenon

This pull request is split into 5 parts for easier review. 👀 Review pull request on Viezly

Changed files are located in these folders:

  • /
  • app/interactions/actions
  • app/jobs
  • app/mailers
  • app/models
  • app/views/mailers
  • db
  • lib/gem_monkey_patches
  • test

viezly[bot] avatar Jul 10 '23 12:07 viezly[bot]

the output list of invalid org contacts currently includes all Estonian org type objects no matter the role. But as we set Force Delete only on domains where such an object is in the role of a registrant we need to generate a sub-list or add a role indicator to the output so it would be possible to filter out only the ones important in the context of ForceDelete.

vohmar avatar Sep 18 '24 12:09 vohmar

latest test resulted in again with multiple instances of the same entity, but more importantly each entity was matched with only one domain so if a company had 3 domains force delete was set only on one of them

vohmar avatar Oct 04 '24 14:10 vohmar