atomic-server
atomic-server copied to clipboard
JSON-AD Importer - atomic data publishing imports
see https://github.com/ontola/atomic-data-docs/issues/93
- [x] Allow JSON-AD importers to deal with
- [x]
localID - [ ]
globalId - [x] References to other (internal) resources
- [ ] Nested resources
- [x]
- [x] Authorization checks
- [x] Create
ImporterClass Resource (and / or Endpoint?) - [x] Add a plugin for the
ImporterClass. - [ ] Periodic runner
- [x] Front-end for
Importer(update JS assets) - [ ] Webhook Parser (maybe do this later)
- [x] CLI option
atomic-server importer ./my-file --to https://localhost/imports/1or parse STDOUT - [ ] Parellizable (would be awesome)
Implementation thoughts
The process of importing things can be initiated in various ways:
- User manually imports some resource.
- Periodic pull*: Server initiates - e.g. auto import of some external URL, checked periodically
- Push: External service initiates. e.g. WebHooks. This makes tokens relevant.
We want a front-end that:
- Easily instantiates Imports. Press the plus icon, create an import
- Allows for manual refresh or automatic / periodic refresh configuration (e.g. every 24 hours) of external URLs
- Allows pasting a JSON-AD field.
- Allows setting rights / tokens. Ideally, you'd get a WebHook URL that you can simply copy/paste into some WebHook client that sends (POSTS?) items
- Shows recently imported items.
The back-end:
- Needs an extended JSON-AD Parser. I think adding an optional
parentargument should suffice. This is the context / the Resource which is set as the parent for everything. Every time a resource is encountered without an@id, but with alocalId, the parent is set to this resource. In the URL generation, the path is created as a child of the Importer's path. So the parent may behttps://example.com/importers/twitterand the new ID will behttps://example.com/importers/twitter/local_id_1. - Background job worker, which periodically fires to update things. Atomic-Server has the runtime, but Atomic-Lib has the
Db. We could spin up some tokio periodic runtime from theDb, though, but this would mean that it may be cloned across threads. I think this should be a server thing. In any case, I'd prefer this to be designed as just another Plugin, which has some sort of periodic function handle. - WebHook parser. This should be handled by
get_extended_resource. I think we're going to have to send thePOSTbody to this function, too... We already parse query params, now we're also gonna parse the body. And it would probably not take very long until we also allow plugins to use HTTP headers. It would definitely make plugins more powerful, but it could also lead to a lower degree of standardization between plugins. Currently, they all work with query parameters, similar to Endpoints. This leads to a standardized API and interactive frontends that can be auto-generated. Maybe we should limit it to accept only abodyif youPOSTand not support HTTP headers. - Token-based auth. Relates to webhook parsing. So we want to allow some sort of system to post things to an Importer (or some child of the importer).
- CLI option. Sending imports over HTTP is fine for small files, but larger ones require a more performant option. Having an
importeroption inatomic-servercli seems logical. I guess we should allow piping JSON-AD resources here.
I need to re-consider how importing happens.
So right now, the parse_json_ad_array function actually adds resources to the store. I think that fails when we try to import a resource which also includes resources with either @id or localId. So maybe adding to the store should happen far deeper.
Currently, the server CLI import command needs an explicit --parent URL if you're parsing new resources. This is kind of cumbersome. I think we may need a default importer, which is created as a step in populate. An alternative is to create a new importer for every import. Maybe also acceptable?
I'm finding it difficult to implement logic for authorization checks.
Attack scenario's that I want to cover:
JSON-AD containing existing resource
Attacker creates JSON-AD file that seems normal, which includes some existing resource (e.g. the Victim's Agent profile). Victim imports the JSON-AD, which overwrites their existing thing (e.g. gives Read + Write rights to Attacker, or edits public key of Agent).
I think the solution is to - by default - only allow importing items that do not overwrite resources that are outside of the hierarchy.
Currently, Importer is a Class Extender. This means that you can instantiate multiple Importers, and they all have URLs.
The alternative approach, is to have one single /import Endpoint. This has some advantages:
- Code is cleaner
- Predictable URL
But it also has limitations, because it is not stateful / does not store any values:
- Can't use periodic runners. That would need some instance that has values
- Doesn't have children. We'd have to require a
parenttarget, as well as the JSON-AD itself.
I'm not planning to do the open tasks, as I don't have a clear usecase for them now. Perhaps later!