integreat-cms icon indicating copy to clipboard operation
integreat-cms copied to clipboard

Add KURSNET offer

Open steffenkleinle opened this issue 10 months ago • 6 comments

Motivation

As a user of the integreat app I want to see language courses and job trainings I can take part in directly in the app.

Proposed Solution

We should implement the KURSNET API in the CMS and provide the available offers via an api for the apps. Tasks:

  • [ ] Crawl the KURSNET API
  • [ ] Create an embedded offer for KURSNET
  • [ ] Provide an endpoint for the apps

Alternatives

Implement the API in the apps.

User Story

As a user of the integreat app I want to see language courses and job trainings I can take part in directly in the app.

Additional Context

From the integreat-app issue: https://github.com/digitalfabrik/integreat-app/issues/2702 (please notify once this is done).

Courses for language classes and job trainings are centrally organized by BAMF and BA in a database called KURSNET. After a few years of communication we found out that different organizations just use the data from KURSNET in their own platform, altough there is not really an official API.

It's still unclear if those data have an official API or needs to be crawled. We have found this but it needs to be checked: https://github.com/AndreasFischer1985/weiterbildungssuche-api

Also we might crawl the data from the platform (https://web.arbeitsagentur.de/sprachfoerderung/home). For Integreat we are just interested in the data which are categorized as "Sprachförderung und Migration" which is sub-categorized into 4 further topics

Design Requirements

None.

steffenkleinle avatar Apr 09 '24 12:04 steffenkleinle

@dkehne who was involved in finding https://github.com/AndreasFischer1985/weiterbildungssuche-api? Someone from the tech-team which could perhaps provide a little more information on the current status here?

steffenkleinle avatar Apr 09 '24 12:04 steffenkleinle

This is a real benefit to our users so we should find a solution in 24Q3...

dkehne avatar May 07 '24 10:05 dkehne

It's still unclear if those data have an official API or needs to be crawled. We have found this but it needs to be checked: https://github.com/AndreasFischer1985/weiterbildungssuche-api

IMHO we should carefully discuss if we want to depend on a library that

a) is maintained by a single person, b) can break any moment if KURSNET decides to change its layout, c) unclear consequences in terms of effort for fixing it.

Additionally, the concept of crawling the website is not officially sanctioned. If KURSNET blocks the IP of our server (rate limiting of whatever), we need to implement pretty crazy workarounds.

I have a very strong opinion: we should not rely on unsupported interfaces for retrieving data for production systems. This will break eventually. It is only a question of when.

svenseeberg avatar May 22 '24 11:05 svenseeberg

It's still unclear if those data have an official API or needs to be crawled. We have found this but it needs to be checked: https://github.com/AndreasFischer1985/weiterbildungssuche-api

IMHO we should carefully discuss if we want to depend on a library that

a) is maintained by a single person, b) can break any moment if KURSNET decides to change its layout, c) unclear consequences in terms of effort for fixing it.

Additionally, the concept of crawling the website is not officially sanctioned. If KURSNET blocks the IP of our server (rate limiting of whatever), we need to implement pretty crazy workarounds.

I have a very strong opinion: we should not rely on unsupported interfaces for retrieving data for production systems. This will break eventually. It is only a question of when.

I mostly agree. relying on unsupported interfaces is definitely a risk. If we'd just occasionally crawl the APII and store the offers in the CMS, this would perhaps be a risk we could take (but not necessarily should take). Depending how much the data in KURSNET changes, this could be an okay solution.

I definitely agree with you to not just directly retrieve the API data in the apps or a proxy in the CMS.

steffenkleinle avatar May 24 '24 09:05 steffenkleinle

Discussion on the conference:

  • This is an important feature with great impact (TBD in a user testing/evaluation)
  • Using a library maintained by one person doesn't change that the API might break at some point. We could talk to the maintainer of the library whether we can support/take over the library.
  • If we cache/store the KURSNET data in the CMS, a breaking API change would not have intermediate negative effects (except outdated data until we fix it)

On hold until user stories/need is evaluated.

steffenkleinle avatar May 25 '24 07:05 steffenkleinle

Some thoughts from todays discussion at the conference just for the record:

  • Implementing a crawler on our own might not be safer than using the existing code because the main risk lies with the API.
  • We discussed implementing a database to cache offers, reducing our dependency on api uptime. This approach allows us to provide information even if the api changes and lowers the risk of our ip being blocked since we only need one API call per day to update the database.
  • If the api is down or changes, we could display a banner to inform users about outdated data and direct them to the Kursnet page for the latest information.
  • We should consider reaching out to the API developer for further support if needed.

deen13 avatar May 25 '24 08:05 deen13