connexion
connexion copied to clipboard
Breaking up large swagger.yaml files
Is there any way to break up large swagger.yaml files into smaller pieces? I've tried using external references but it doesn't seem like Connexion supports these. I've also tried using jinja features to break the swagger files up but this doesn't work (and its ugly).
Am I missing a feature that would let me do this? If not, is relative referencing a feature Connexion would be interested in? I know that the jsonschema library used internally supports relative referencing so it might not be too difficult to implement. If you guys are interested, I can take a crack at adding the feature.
@patrickw276 I think it's not supported right now in Connexion, but "people" might be interested in this feature. I personally don't have the need for it as IMHO "microservices" should not have large Swagger files :smirk:.
As @hjacobs said, one using Connexion might be interested in this kind of feature. It is something interesting to support. Please send a PR and we will review. 😃
@hjacobs My use case doesn't involve very much processing but the model's schema is somewhat complex. It doesn't make much sense to break the API into smaller microservices (IMO).
I've started working on a PR where the swagger document's external references are resolved into a single dictionary before being passed to the swagger validator. For example:
# doc1.yaml
key1: value1
key2:
$ref: 'doc2.yaml#/key3'
# doc2.yaml
key3: value3
will resolve to a dictionary that looks like
{'key1': 'value1', 'key2': 'value3'}
Recursive references will be handled by pointing the recursive $ref
to its new location in the combined document. Do you guys see any issues with this approach? Also, I don't plan on implementing non-local reference resolution (e.g. using http).
A simpler way, since Connexion already uses Jinja2 for template rendering, would be making use of the {% include "something.yaml" %}
directive. All necessary changes to make this work is to configure the Template#environment attribute in the Connexion code to use the jinja2.FileSystemLoader loader. Then you would be able to break your Swagger/OpenAPI definition in several files.
@rafaelcaricio That would definitely be simpler to implement BUT you can get in to some weird issues with local references. These references would have to be declared relative to the complete concatenated document, not the document they actually exist in. I'm not sure how often this would be an issue in a real API but it's something to consider.
I favor the reference resolution approach I outlined above because it implements a feature as outlined in the Swagger/OpenAPI Specification.
I do not dislike your solution. I am just trying to show more possibilities. ✌️
Anyway, where is this outlined in the Swagger/OpenAPI Specification?
@rafaelcaricio It's good to be open to other approaches, so I definitely appreciate the input (and all your guys work on Connexion in general).
The spec mentions them here and a few other places.
EDIT: Actually, here is a better place in the document to read, along with the link to canonical dereferencing.
@patrickw276 Sounds good. 👍
If you are looking for an alternative while this gets baked in. I used a node app to help because I have a swagger api that is split up across 15 to 20 files fairly large.
The tool I used is really simple to create a single swagger file that connexion will work with as it'll fully dereference pointers for you.
https://www.npmjs.com/package/swagger-cli
here's a quick start for you
npm install -g swagger-cli
swagger validate <path_to_root_spec>
Once it's validated it can make a single file doing the following
swagger bundle -r -o <output_path> <path_to_root_spec>
The -r is to fully dereference.
Out of curriosity, how is that handled in Zalando? Is this file https://api.zalando.com/schema/swagger.json also bundled?
Looks like you can find more on the shop api here https://github.com/zalando/shop-api-documentation
@do3cc I personally don't know anybody in Zalando who splits up Swagger files (mostly it's "microservices", right?). Our RESTful API Guidelines also don't cover this.
I came to the conclusion that to do this cleanly, loading the individual files instead of bundling, the yelp swagger_spec_validator would need to support yaml directly. I opened up a PR on that repo but never got any feedback.
I do think there is a valid use case for this feature that can't be solved with microservices. I work with earth science metadata where the schemas can be quite complex. It wouldn't really make sense to break an api storing this data into microservices but breaking the swagger spec up could help with managing the complexity.
Any update on this? It is quite uncomfortable using a single file, why not allow multiple files as in the OpenAPI spec?
@advance512 it is not a matter of allowing or not. We don't mind to accept a PR that solves this issue. But so far, there is none.
I've done quite a bit of research on this problem. There are 2 compounding issues:
-
yelp/swagger-spec-validator converts yaml to json before validation so that they can use Julian/jsonschema.
-
Julian/jsonschema currently cannot handle file relative paths such as
{ $ref : somefile.json#/some_id }
Before swagger-spec-validator started using jsonschema, it did handle file relative paths, but regressed.
My team worked around the first problem by converting our yaml to json during our build process (including twiddling filenames within the data).
We're also working on a patch to Julian/jsonschema to fix the 2nd problem. It may require also submitting a patch to swagger-spec-validator.
Personally, I'd put all work on this set of problems off until the bug in Julian/jsonschema is fixed. After that, it's possible that connexion could use the yaml loading script we've written. However, I really think that yelp/swagger-spec-validator should be handling the yaml-json conversion.
(seeing all the referencing bugs, it's clear there's more than one way to fix this...)
@dwlocks thanks for the insights!
So, I guess that for now, you'd combine the files in a build process - right? Or are there any better alternatives?
I think combining probably the best of not so great options. If you're somewhat careful with the YAML, simple concatenation should work. I personally would save the $refs so that they work in the concatenated state. IE are all document relative "#/" style.
One of the bugs here I think suggested a templating with jinja2 thing.
(I had a few days to go at this during work, but no longer. I'm working on the jsonschema bits in my spare time at home now.)
@dwlocks I came here after foolishly replicating some of your research. I haven't yet looked into the swagger-spec-validator, but i was considering the following idea:
If I modify resolve_remote
to treat scheme == ''
as "this is a local relative file lookup" (and thus I don't have to give it 'file://' + abspath
), then I can add a branch which does a regular open and tries to read yaml first if the extension is correct, then converts it to json.
I think this is all I would need. Let me know if you think it would not be sufficient.
I don't see this getting merged into the primary jsonschema branch anytime soon (would no doubt require handling all branches of that method correctly), but it would at least be something connexion users could use as well (by way of installing a specific branch).
Actually, looking here, it looks like the real solution is to override the RefResolver used by connexion, or possibly by swagger-spec-validator (don't yet know which of those is where it's literally invoked), so that it does sets base_uri correctly and reads yaml.
Would it be possible to continue to use a single swagger spec but integrate a tool that allows each endpoint or group of endpoints to be described separately and integrated to create the necessary connexion swagger file?
I've looked at swagger-aggregator and spec-synthase, the later of which looks the more hopeful. I may report back if I make any progress. At present the swagger spec for our API server is over 2,700 lines long, and I want to be able to refactor it so the Swagger specification can live in a directory tree with a parallel structure to the code.
@holdenweb Please do report back, interesting.
@holdenweb, I realized today a citation to my package Spec-Synthase, sorry by not have more documentation. If you want to build the swagger.yml from command line, take a look at: http://spec-synthase.readthedocs.io/en/latest/usage.html
If you want to create spec in runtime and use it with connexion, you can see: https://github.com/MicroarrayTecnologia/spec-synthase/blob/master/tests/test_specsynthase.py#L20
We are using Spec-Synthase in production on Mozilla Application Update Service (mozilla/balrog), our scenario is share peaces of spec between a public and a admin API: https://github.com/mozilla/balrog/blob/df92ab8ae523f7aedea7ed00b61b49a56dc56f0d/auslib/web/public/base.py#L30-L37
https://github.com/mozilla/balrog/blob/df92ab8ae523f7aedea7ed00b61b49a56dc56f0d/auslib/web/admin/base.py#L16-L29
Seems that spec-synthase fits well for your scenario, once you can have n-files describing paths, n-files describing responses, etc.
Another tool for resolving local references : https://github.com/wework/speccy
I am starting to think that this behavior is best left to a tool outside of connexion (like wework/speccy). Otherwise we have to sort out the problems in #798 related to serving up all of the references, and making sure they are accessible to swagger-ui. Thoughts?
@dtkav I just stumbled across this issue after running into it when trying to load an openapi spec with relative references. Use cases: multiple services that accept/return objects with the same model; authorization format; big regexs, etc.
Solutions involving the file:/
scheme or Jinja2 templating are a nonstarter (for me) because the definition files are no longer valid OpenAPI and I sacrifice portability.
Solutions involving speccy or other combination tools are kind of a bummer because it adds complexity to the build process.
TL;DR as a user, I would really like to see connexion support relative paths natively.
Just adding my 2 cents...
I do split api's into different yaml files, so that I have various small API's. If I would like to reuse, I simple merge them on the fly. Each micro-API should have its own namespace.
specsynthase
is good at merging specifications, detecting duplicated keys and the like and capable of validating the specifications it works with.
I've personally gone for a rather lightweight approach, where a package that implements a set of endpoints is defined by a paths.json
file in the package's top-level directory. The definitions section is currently shared by all endpoints, as some definitions are common to many endpoints, but this isn't an absolute requirement of the design.
The endpoint set is defined as an extended flask.Blueprint
and can be mounted at any point in the server's address space (which is what determines the path it gets associated with in the specification).
@holdenweb yep, more or less I followed same approach.