Migrate from `PyLD` to `RDFLib`
I'd like to propose that our usage of the PyLD dependency be swapped out for the more versatile and better maintained RDFLib.
Let me bullet point some reasons for this suggestion:
- the current state of the PyLD project is not clear to me. The most recent comment on their most recent PR basically says "let's merge this now", and that was 3 months ago.
- v2.0.3 was released in 2020. And then v2.0.4 was released in Feb this year.
- the repo has no automated testing. When I tested it on my local, I get ~20 test failures, so as far as I can tell it's in a broken state.
- their jsonld.py file is 6700 lines, which was "ported from the JavaScript implementation of JSON-LD"
- their requirements.txt has unpinned dependencies
Long story short, the impression I get is that PyLD is not maintained or maintainable.
RDFLib, on the other hand, seems to be the gold standard for Resource Description Framework (RDF) processing in python. They offer:
- parsers and serializers for RDF/XML, N3, NTriples, N-Quads, Turtle, TriX, Trig and JSON-LD
This repo has much more adoption and activity, and so I think it would be better to migrate to this instead. This is just an idea and open for discussion.
PyLD is currently used in quite a few places in ACA-Py, and probably represent quite a refactoring job to replace it. But I think this may be a worthwhile operation, to reduce technical debt and improve the functionality of a core feature.
Relates to #2941
PyLd is a library from the authors of the json-ld specification, Digital Bazaar. It aims to be compliant with the specification by conformity with the json-ld processor test-suite. We updated v2.0.4 since we uncovered a bug in the library and submitted a PR. Going forward there are other bugs in the library which creates interoperability issues with aca-py at the moment for some specific cases.
I'm not opposed to changing the library but I think more tests are needed to see if its a suitable replacement.
Pyld can use an asynchronous document loader as well
@PatStLouis thanks! The synchronous requests document loader was the main reason I thought a replacement is necessary. Because, in ACA-Py, there's a JsonLdDocumentDownloader, which uses the requests framework for synchronous downloads, which is then sort of hacked into an async method in the DocumentLoader class. This leads to nest_asyncio been necessary, to support nested event loops, which all feels like a very smelly way of doing things. That's what got me looking for alternatives.
If an asynchronous document loader can be used instead, that may be a sufficient improvement. So I'll keep this in mind. But yes, it may be that RDFLib is ultimately the way to go, given it has more activity and wider support
@ff137 is the async pyld document loader sufficient for your use case?
Hi @PatStLouis, I'll need some time to investigate. Will have to test and confirm if the existing synchronous logic can be replaced. I still think migrating to RDFLib is worthwhile overall, although it's probably out of scope for the 1.0.0 release. Either way, over the next couple weeks I'll see if I can put something together to test the async doc loader 👌
pyld is core to a lot of the issuance/verification processes, something to keep in mind. Can you point me into the code where pyld is causing issues?
I think trying to change from request to aiohttp for the document loader is a good place to start and aligns with the rest of the code base