aries-cloudagent-python icon indicating copy to clipboard operation
aries-cloudagent-python copied to clipboard

Migrate from `PyLD` to `RDFLib`

Open ff137 opened this issue 1 year ago • 8 comments

I'd like to propose that our usage of the PyLD dependency be swapped out for the more versatile and better maintained RDFLib.

Let me bullet point some reasons for this suggestion:

  • the current state of the PyLD project is not clear to me. The most recent comment on their most recent PR basically says "let's merge this now", and that was 3 months ago.
  • v2.0.3 was released in 2020. And then v2.0.4 was released in Feb this year.
  • the repo has no automated testing. When I tested it on my local, I get ~20 test failures, so as far as I can tell it's in a broken state.
  • their jsonld.py file is 6700 lines, which was "ported from the JavaScript implementation of JSON-LD"
  • their requirements.txt has unpinned dependencies

Long story short, the impression I get is that PyLD is not maintained or maintainable.

RDFLib, on the other hand, seems to be the gold standard for Resource Description Framework (RDF) processing in python. They offer:

  • parsers and serializers for RDF/XML, N3, NTriples, N-Quads, Turtle, TriX, Trig and JSON-LD

This repo has much more adoption and activity, and so I think it would be better to migrate to this instead. This is just an idea and open for discussion.

PyLD is currently used in quite a few places in ACA-Py, and probably represent quite a refactoring job to replace it. But I think this may be a worthwhile operation, to reduce technical debt and improve the functionality of a core feature.

ff137 avatar May 10 '24 10:05 ff137

Relates to #2941

ff137 avatar May 10 '24 11:05 ff137

PyLd is a library from the authors of the json-ld specification, Digital Bazaar. It aims to be compliant with the specification by conformity with the json-ld processor test-suite. We updated v2.0.4 since we uncovered a bug in the library and submitted a PR. Going forward there are other bugs in the library which creates interoperability issues with aca-py at the moment for some specific cases.

I'm not opposed to changing the library but I think more tests are needed to see if its a suitable replacement.

Pyld can use an asynchronous document loader as well

PatStLouis avatar Jun 03 '24 16:06 PatStLouis

@PatStLouis thanks! The synchronous requests document loader was the main reason I thought a replacement is necessary. Because, in ACA-Py, there's a JsonLdDocumentDownloader, which uses the requests framework for synchronous downloads, which is then sort of hacked into an async method in the DocumentLoader class. This leads to nest_asyncio been necessary, to support nested event loops, which all feels like a very smelly way of doing things. That's what got me looking for alternatives.

If an asynchronous document loader can be used instead, that may be a sufficient improvement. So I'll keep this in mind. But yes, it may be that RDFLib is ultimately the way to go, given it has more activity and wider support

ff137 avatar Jun 04 '24 12:06 ff137

@ff137 is the async pyld document loader sufficient for your use case?

PatStLouis avatar Jun 18 '24 17:06 PatStLouis

Hi @PatStLouis, I'll need some time to investigate. Will have to test and confirm if the existing synchronous logic can be replaced. I still think migrating to RDFLib is worthwhile overall, although it's probably out of scope for the 1.0.0 release. Either way, over the next couple weeks I'll see if I can put something together to test the async doc loader 👌

ff137 avatar Jun 19 '24 14:06 ff137

pyld is core to a lot of the issuance/verification processes, something to keep in mind. Can you point me into the code where pyld is causing issues?

PatStLouis avatar Jun 19 '24 19:06 PatStLouis

I think trying to change from request to aiohttp for the document loader is a good place to start and aligns with the rest of the code base

PatStLouis avatar Jun 19 '24 19:06 PatStLouis