phenopacket-schema icon indicating copy to clipboard operation
phenopacket-schema copied to clipboard

What library should R developers use for phenopackets?

Open cmungall opened this issue 1 year ago • 5 comments

From:

  • https://github.com/monarch-initiative/helpdesk/issues/69

What strategy should R developers and non python/java developers use?

I know there is an R protobuf library, but I don't think it's an official protobuf/google product:

https://cran.r-project.org/web/packages/RProtoBuf/

cmungall avatar Nov 16 '22 17:11 cmungall

AFAIK there is no easy way to work with phenopackets in R. For now we are emphasizing Java and Python, but an R library would be useful. I would think we could explore using the automatically generated C++ library and then possibly Rcpp or a similar approach. But I think first we should figure out what we want to do in R?

pnrobinson avatar Nov 16 '22 19:11 pnrobinson

What's wrong with using the library Chris suggested? Sure, it's going to build you a bare-bones model, but that's already a good start for parsing and using the data.

julesjacobsen avatar Nov 16 '22 20:11 julesjacobsen

Well, because without support for validation such as we have for Java in https://github.com/phenopackets/phenopacket-tools and will soon have in Python, it is hard to write correct phenopackets. It depends on what one wants to do, but we should try to develop good libraries in every language in which people will work with phenopackets a lot!

pnrobinson avatar Nov 17 '22 12:11 pnrobinson

I think it helps to separate use cases here. Broadly these fall into two categories:

  • import
  • export

For export I agree that we need good library support for validation, but there are a variety of strategies, including services or calling the java or python libraries, or encoding in linkml etc.

However, the helpdesk request that prompted this was for import, so there is no need for a full R validation suite here

cmungall avatar Nov 18 '22 16:11 cmungall

What if an invalid phenopacket is imported? This could lead to spurious results, and the analysis of data might not be correct - especially for more complicated phenopackets - unless dedicated software is used rather than generic json. I think we should always validate and if we want to support R we should figure out how to write a library similar to phenopacket tools.

pnrobinson avatar Nov 18 '22 16:11 pnrobinson