tools-golang icon indicating copy to clipboard operation
tools-golang copied to clipboard

Should be able to read and write RDF files

Open swinslow opened this issue 5 years ago • 12 comments

Goal

tools-golang currently only handles SPDX files in tag-value format. It should also be capable of reading and writing SPDX files in RDF format, as an RDF format is also officially defined by the SPDX spec.

General description

Most of the functionality for tools-golang centers around the data structures defined in the spdx Golang package. Tag-value files can be read into these data structures using tvloader, and the data structures can be written to new tag-value files using tvsaver.

The objective for this enhancement is to create similar packages, rdfloader and rdfsaver, which can similarly read and write SPDX files in RDF format.

Specific details

As you'll see, the existing code in the Golang packages described above only handles version 2.1 of the SPDX specification. Let's similarly keep the focus on version 2.1 only for the RDF loading and saving, for the time being.

For new code, please mirror the suffixes "2v1" and "2_1" as they are used in the code described above. That's intended to reflect that this code is SPDX version 2.1-specific, and will (hopefully!) help make it easier to add new versions later.

It's important to me that any new code that is added should be well-tested. Please take a look at the *_test.go files in the existing directories for examples. If you are submitting new code, please be sure to include test files to go along with them.

The older version of the SPDX Golang tools, at https://github.com/spdx/ATTIC-tools-go, included RDF parsing for an older version of the SPDX specification. However, it relied on goraptor and Raptor RDF Syntax Library, the latter of which is a C library. I haven't reviewed closely, but it looks like this would require users to separately obtain and install the C library. If at all possible, I'd like for us to find a pure-Golang solution. However, that might not be possible, and I'm open to suggestions here.

swinslow avatar Mar 24 '19 22:03 swinslow

@swinslow , I found this beautiful library when in go-lang https://godoc.org/github.com/knakk/rdf.

Is this issue assigned to someone or is anyone already working on this issue.
If not, I would like to take responsibility of this issue and also other file formats for Gsoc 2019. Is there any microtask that I need to perform??

rishabh-bhatnagar avatar Mar 28 '19 01:03 rishabh-bhatnagar

@rishabh-bhatnagar The only issue I see with the knakk/rdf is lack of support for encoding RDF/XML which is used in the Java SPDX tools.

If we can find a library that also encodes, that would be great (we use Apache Jena in the Java tools.

If no such libraries exist, going forward with knakk/rdf would probably be the best route as it does support other formats and can encode RDF/XML.

goneall avatar Mar 28 '19 01:03 goneall

I didn't find any such library written in go-lang that can support RDF/XML document.

Is there any other person who has already started working on this project for Gsoc 2019??

rishabh-bhatnagar avatar Mar 28 '19 02:03 rishabh-bhatnagar

@rishabh-bhatnagar The only issue I see with the knakk/rdf is lack of support for encoding RDF/XML which is used in the Java SPDX tools.

If we can find a library that also encodes, that would be great (we use Apache Jena in the Java tools.

If no such libraries exist, going forward with knakk/rdf would probably be the best route as it does support other formats and can encode RDF/XML.

I have already started writing library for the same. Should I continue writing?? or focus on making the functionalities available directly in rdfloader and rdfsaver??

RishabhBhatnagar avatar Mar 30 '19 11:03 RishabhBhatnagar

Unfortunately, it looks like https://github.com/knaak/rdf doesn't use a real license -- see https://github.com/knakk/rdf/blob/master/LICENSE. And, it appears the author is not willing to add one, see https://github.com/knakk/rdf/issues/15.

Given that, and given SPDX's focus on proper licensing, I don't think we would be comfortable using knakk/rdf as a core dependency.

If there really isn't any other Golang library that handles RDF/XML properly, I'm open to considering use of https://github.com/deltamobile/goraptor as the prior SPDX Golang tools were doing. It's not my preference because it requires a non-Golang library, and because its LGPL-2.1 license might be a bit complicated for use with this project. But I'm open to it if it seems to be the only option for now.

swinslow avatar Mar 30 '19 14:03 swinslow

I was asking that,
If there is no existing libraries, why not make a new one??
And if i am making a new one, should i include it as files in this repostitory or make it a separate library which will be a dependency for this project??

RishabhBhatnagar avatar Mar 30 '19 14:03 RishabhBhatnagar

If you're looking at making a new library to parse and handle RDF data, feel free -- I certainly won't argue =)

I'd just say, keep in mind that writing a parser library can be an enormous undertaking. I myself don't have any real familiarity with RDF data models or RDF/XML, which is one reason we don't have support here yet. If you feel confident in being able to write an RDF library in Golang, that's fantastic and could be very helpful to the ecosystem! I'm afraid I just won't have much useful input to help you along the way...

If you are taking this approach, of writing a new RDF decoding/encoding library from scratch, I expect you'll want to do it in a separate project -- not as part of spdx/tools-golang. If you get it up and running, then we could certainly use your project as a dependency in the future.

swinslow avatar Mar 30 '19 14:03 swinslow

RDF parsing merged in PR #46. There are some things to address for which I'll file separate issues.

I'll keep this issue open until the RDF writing side of things is implemented as well.

swinslow avatar Nov 14 '20 17:11 swinslow

Hey!! @swinslow, @RishabhBhatnagar. I am willing to work on this issue for the Gsoc-2021. On PR #46 @RishabhBhatnagar already implemented the rdfloader part, & as the rdfsaver part still need to be coded, I want to work on that feature. I would be very much happy if you guys guide me for the same. Is there any task left, on the existing implementation to get more familiarize with the current codebase? Thanks.

bisakhmondal avatar Feb 19 '21 11:02 bisakhmondal

Hello @bisakhmondal. You can find the project idea here

For familiarizing with the current codebase, you can run examples and tinker around.

To create a writer for the spdx document, it is recommended to read more about the spdx spec and rdf/xml specific document. You can also go through several other formats here.

Let me set a context about the way the rdf/xml writer fits in. RDF/XML is a hierarchical structure that is often represented as rdf-triples internally.

  1. We have spdx documents written in rdf/xml format.
  2. There's a tool called as gordf that is capable of a. generating rdf triples from the rdf/xml file. b. writing back rdf triples to a rdf/xml file.
  3. rdfloader uses triples generated using the gordf yields a spdx 2.2 document model object
  4. rdf writer would take in the document object and convert it back to triples such that it can be given to the gordf for writing it back to the rdf/xml file.

@swinslow, correct me if I'm wrong :) @bisakhmondal, Hope it helps and I've not confused you more.

RishabhBhatnagar avatar Feb 21 '21 04:02 RishabhBhatnagar

Thanks, @RishabhBhatnagar, for such a meticulously written guide. Haha no no, I'm not confused yet :) Give me a day or two for going through these links and the work you put in the gordf. Will get back to you then.

bisakhmondal avatar Feb 21 '21 19:02 bisakhmondal

Hii @RishabhBhatnagar, I read those amazing docs, now am pretty much comfortable with handling RDF files. Thank you again. Do you have any mini task in your mind regarding the project, I'd be happy to implement it :)

bisakhmondal avatar Mar 05 '21 05:03 bisakhmondal