handelsregister icon indicating copy to clipboard operation
handelsregister copied to clipboard

Kostenfreie abfragen ab 01.08.2022

Open wirthual opened this issue 3 years ago • 15 comments

As per their website:

With the coming into effect of the Law on the Implementation of the Digitalization Guidelines (DiRUG) on 01.08.2022, access to all register content in the trade, cooperative, association and partnership register as well as to any electronically available documets through the Common Register Portal of federal states is provided free of charge starting from 01.08.2022. After that date, no registration and no log-in is required any more.

Do they offer an API description? 😅

wirthual avatar Jul 19 '22 13:07 wirthual

I have querying working using MechanicalSoup now. Took a bit of prodding with their weird javascript form.

alper avatar Aug 02 '22 23:08 alper

@alper sounds awesome. Maybe would be cool if you could document it?

LilithWittmann avatar Aug 03 '22 08:08 LilithWittmann

Got the stub in #7.

Next up grab all the relevant belonging to a specific company and see if the people can be parsed out?

alper avatar Aug 03 '22 17:08 alper

(Used mechanize after all because it's pretty solid and familiar once it works.)

alper avatar Aug 03 '22 17:08 alper

Maybe going to use Selenium after all because this is the post payload for getting one of the documents:

ergebnissForm=ergebnissForm&javax.faces.ViewState=-8635335262319402326%3A6636106239244724446&ergebnissForm%3AselectedSuchErgebnisFormTable_rppDD=10&ergebnissForm%3AselectedSuchErgebnisFormTable_rppDD=10&ergebnissForm%3AselectedSuchErgebnisFormTable%3A0%3Aj_idt164%3A2%3Afade=ergebnissForm%3AselectedSuchErgebnisFormTable%3A0%3Aj_idt164%3A2%3Afade

and its triggered in javascript by this Jakarta Server Faces application.

alper avatar Aug 03 '22 17:08 alper

I still have to try it out. It could be that this thing has a <noscript> fallback.

alper avatar Aug 05 '22 08:08 alper

I ported it to Selenium and can download the PDF files now. Will polish it and make sure you can for a given company get all the PDFs.

alper avatar Aug 06 '22 09:08 alper

@alper Does it also work without active JavaScript? Can you provide your source code?

deeprobin avatar Aug 09 '22 17:08 deeprobin

I'll post it after one more iteration.

It seems nothing here works without javascript.

alper avatar Aug 10 '22 09:08 alper

Is this still necessary? I got it to work in headless and download all the straightforward documents for an entity.

CleanShot 2022-08-20 at 13 26 05@2x

This can be cleaned up, documents moved into a permanent location and run in batch but Selenium/Gecokdriver is kinda unreliable.

It's in my fork here: https://github.com/alper/handelsregister/blob/main/sel.py

alper avatar Aug 20 '22 11:08 alper

It's in my fork here: https://github.com/alper/handelsregister/blob/main/sel.py

How to run it in headless mode? The readme in your fork only describes how to use the regular handelsregister.py, not the sel.py

tillewolle avatar Oct 03 '22 09:10 tillewolle

It should already be headless like this: https://github.com/alper/handelsregister/blob/main/sel.py#L37

alper avatar Oct 03 '22 11:10 alper

I thought I would be able to download .pdf files with the sel.py but I find no information about how to download them.

tillewolle avatar Oct 03 '22 12:10 tillewolle

I think it does but I haven't used it for a while and it's grossly untested. It definitely won't work to just get a bunch of PDFs without a lot of handling.

alper avatar Oct 03 '22 12:10 alper

Hi @alper , I was trying to run sel.py in colab . I seem to get an error in the following line https://github.com/alper/handelsregister/blob/e6cea7d92041e4a28c323ea390c9bdb5bbab7a1d/sel.py#L65 and the error trace is as follows Do you know what could be wrong here?

Registerportal | Advanced search
<selenium.webdriver.remote.webelement.WebElement (session="60091ef69448ee4dc5d60e9b753fa24e", element="3b7448c5-f072-4e11-af93-25c85be00d4d")>
<selenium.webdriver.remote.webelement.WebElement (session="60091ef69448ee4dc5d60e9b753fa24e", element="105dcd63-80e1-4bf5-9a25-1cedbd5785e7")>
---------------------------------------------------------------------------
ElementClickInterceptedException          Traceback (most recent call last)
<ipython-input-43-c212176fe302> in <cell line: 21>()
     19 search_button = driver.find_element(By.XPATH, "//button[@id='form:btnSuche']")
     20 print(search_button)
---> 21 print(search_button.click())
     22 #document_list = ['AD','CD','HD',# 'DK',# 'UT'# 'VÖ','SI']

3 frames
/usr/local/lib/python3.10/dist-packages/selenium/webdriver/remote/errorhandler.py in check_response(self, response)
    243                 alert_text = value["alert"].get("text")
    244             raise exception_class(message, screen, stacktrace, alert_text)  # type: ignore[call-arg]  # mypy is not smart enough here
--> 245         raise exception_class(message, screen, stacktrace)

ElementClickInterceptedException: Message: element click intercepted: Element <button id="form:btnSuche" name="form:btnSuche" class="ui-button ui-widget ui-state-default ui-corner-all ui-button-text-only searchButton" onclick="PrimeFaces.bcn(this,event,[function(event){PF('btnSuche').disable()},function(event){PrimeFaces.ab({s:&quot;form:btnSuche&quot;,f:&quot;form&quot;,u:&quot;form&quot;});return false;}]);" type="submit" role="button" aria-disabled="false">...</button> is not clickable at point (767, 1065). Other element would receive the click: <a href="#page-wrapper">...</a>
  (Session info: headless chrome=90.0.4430.212)
Stacktrace:
#0 0x56b032b607f9 <unknown>
#1 0x56b032b003b3 <unknown>--> 245         raise exception_class(message, screen, stacktrace)

ElementClickInterceptedException: Message: element click intercepted: Element <button id="form:btnSuche" name="form:btnSuche" class="ui-button ui-widget ui-state-default ui-corner-all ui-button-text-only searchButton" onclick="PrimeFaces.bcn(this,event,[function(event){PF('btnSuche').disable()},function(event){PrimeFaces.ab({s:&quot;form:btnSuche&quot;,f:&quot;form&quot;,u:&quot;form&quot;});return false;}]);" type="submit" role="button" aria-disabled="false">...</button> is not clickable at point (767, 1065). Other element would receive the click: <a href="#page-wrapper">...</a>
  (Session info: headless chrome=90.0.4430.212)
Stacktrace:
#0 0x56b032b607f9 <unknown>
#1 0x56b032b003b3 <unknown>

timtensor avatar Jul 12 '23 15:07 timtensor