rigraph icon indicating copy to clipboard operation
rigraph copied to clipboard

Be able to read and write to strings in read_igraph, write_igraph

Open clpippel opened this issue 3 years ago • 18 comments

What is the feature or improvement you would like to see?

Be able to read and write to strings in read_graph(), write_graph()

For example: Create graph.

g <- make_ring(10)
write_graph(g)           # output graph to the console, e.g. stdout()
write_graph(g, file="")  # output graph to the console, e.g. stdout()

~~write_graph(g, text = s) # output graph to string~~ And read back.

library(igraph)
g <- make_ring(10)
s1 = r"---(0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 0 9)---"
# g1 <- read_graph(text=s1, format="edgelist", directed=FALSE) # proposed
g1 <- read_graph(rawConnection(charToRaw(s1)), format="edgelist", directed=FALSE) # work around as suggested by ntamas
isomorphic(g, g1)

From the R documentation of write.table:

text | character string: if file is not supplied and this is, then data are read from the value of text via a text connection. Notice that a literal string can be used to include (small) data sets within R code.

The current implementation of write/read_graph uses binary connections. Therefore, the above construction fails. One idea is to add ascii =TRUE as in saveRDS.

Use cases for the feature

  1. Avoids interaction with the file system.
  2. Makes it easier to create standalone code samples.
  3. Allows moving graph objects across platforms using cut and paste and plain text files.

References See also read.csv, write.csv

clpippel avatar Jun 27 '22 08:06 clpippel

There are already several formats that encode the graph in a textual human-readable form that can be safely copied and pasted. In fact, all formats supported by write_graph are textual. read_graph supports GraphDB which is a binary format.

I am not really an R user, but something along these lines should work when the graph data is in a string:

read_graph(rawConnection(charToRaw('graph [ directed 0 node [ id 1 ] node [ id 2 ] edge [ source 1 target 2 ] ]')), 'gml')

For some reason textConnection does not work here.

Does this work for your purposes?

szhorvat avatar Jun 27 '22 11:06 szhorvat

Everything is possible with a Turing machine. The proposed solution is not entirely intuitive.

I'm looking for something like s <- toString(g1), copy paste string s, and g2 <- fromString(s). My preference is to move objects (e.g. for bug reporting) to another environment using strings, rather than more complicated objects that are more prone to virus infection. Or may have other unwanted side effects. I also try to avoid the file system if possible.

I am a recreational user. I want to keep it as simple as possible.

clpippel avatar Jun 27 '22 11:06 clpippel

igraph's core is written in C, and eventually all read_graph() and write_graph() functions end up in C, where we need to pass a FILE* object to the underlying C code. I guess that textConnection() does not provide a FILE* object so we cannot pass anything to the C layer that would make sense there. This is a limitation of igraph not only for R but also for other higher level interfaces like Python. I'm afraid that the only way to solve this transparently is with a helper function that saves the graph to a temporary file, then reads back the temporary file into a string and throws away the file itself.

ntamas avatar Jun 27 '22 12:06 ntamas

The functions read.csv(), write.csv() output to a string with text = s. It is not a new functionality.

clpippel avatar Jun 27 '22 12:06 clpippel

I certainly see the appeal, as I use similar functionality all the time from Mathematica. Can you elaborate on how this works with write.csv? read.csv has a text argument but how do you get a string from write.csv?

Also, if you are using Windows, please test the suggestion I made and report back about whether it worked on that platform.

szhorvat avatar Jun 27 '22 13:06 szhorvat

Try:

g <- make_ring(10)
write.csv(str(g))

If file = is omitted my guess is that the output is send to the standard output, e.g. the console (in windows Rgui).

clpippel avatar Jun 27 '22 13:06 clpippel

I have tried: read_graph(rawConnection(charToRaw('graph [ directed 0 node [ id 1 ] node [ id 2 ] edge [ source 1 target 2 ] ]')), 'gml')

The result is:

IGRAPH a0fdbdb U--- 2 1 -- 
+ attr: id (v/n)
+ edge from a0fdbdb:
[1] 1--2

It shows only one edge. The full output is:

IGRAPH df2cb86 U--- 10 10 -- Ring graph
+ attr: name (g/c), mutual (g/l), circular (g/l)
+ edges from df2cb86:
 [1] 1-- 2 2-- 3 3-- 4 4-- 5 5-- 6 6-- 7 7-- 8 8-- 9 9--10 1--10

sessionInfo

R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)

clpippel avatar Jun 27 '22 13:06 clpippel

Eventually I managed to read and write a graph using the clipboard.

library(igraph)
g1 <- sample_gnm(100, 50)
object.size(g1)
saveRDS(g1, file = "clipboard-128", ascii = TRUE, version = NULL, compress = FALSE, refhook = NULL)
g2 <- readRDS(file("clipboard-128", "r"), refhook = NULL)
identical_graphs(g1, g2)

5184 bytes [1] TRUE

Relevant documentation can be found in: [https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/connections] or type ?connections in the R console.

clpippel avatar Jul 03 '22 09:07 clpippel

Suppose my machine has been damaged by a virus. Is it possible that the virus is spreading through the workspace file? This is why I prefer plain text over files I don't know.

clpippel avatar Jul 03 '22 10:07 clpippel

Unfortunately trying :

library(igraph)
g1 <- sample_gnm(100, 50)
object.size(g1)
saveRDS(g1, file = "clipboard-128", ascii = TRUE, version = NULL, compress = FALSE, refhook = NULL)

Now I cut and paste the clipboard and after

g2 <- readRDS(file("clipboard-128", "r"), refhook = NULL)

I receive the error message

Error in readRDS(file("clipboard", "r"), refhook = NULL) : 
  unknown input format

It feels rather shaky. It works. I did something stupid: I copied the readRDS through the clipboard destroying the original content...

clpippel avatar Jul 03 '22 10:07 clpippel

Is it possible that the virus is spreading through the workspace file?

I find that highly unlikely.

ntamas avatar Jul 03 '22 19:07 ntamas

I thought along the following lines. What if the workspace contains code with a bug that results in arbitrary code execution e.g. a crash as in the permute example. Then I could put malicious code in data structures and such that the code is executed. It doesn't seem easy to implement, but not impossible.

Representatives of institutions are attractive victims. In this way an institution can be taken hostage.

clpippel avatar Jul 03 '22 20:07 clpippel

I certainly see the appeal, as I use similar functionality all the time from Mathematica. Can you elaborate on how this works with write.csv? read.csv has a text argument but how do you get a string from write.csv?

Consider a dataframe to be transferred.

# step 1
df1 <- mtcars[1:5, ]
write.csv(df1)

The output is copied into the receiving end as follows:

# step 2, use a raw string to input tricky text
s1 <- r"---(
"","mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb"
"Mazda RX4",21,6,160,110,3.9,2.62,16.46,0,1,4,4
"Mazda RX4 Wag",21,6,160,110,3.9,2.875,17.02,0,1,4,4
"Datsun 710",22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
"Hornet 4 Drive",21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
"Hornet Sportabout",18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
)---"

df2 <- read.csv(text=s1, row.names=1)
identical(df1, df2)
identical(toString(df1), toString(df2))

On my machine df1 and df2 are not identical. (?) However, their string representations are.

This allows for small stand alone programming without relying on a file system.

The results of saveRDS are indeed different:

saveRDS(df1, file = "clipboard-128", ascii = TRUE, version = NULL, compress = FALSE, refhook = NULL)
saveRDS(df2, file = "clipboard-128", ascii = TRUE, version = NULL, compress = FALSE, refhook = NULL)


clpippel avatar Jul 04 '22 10:07 clpippel

What if the workspace contains code with a bug that results in arbitrary code execution

Any other code on your machine (including igraph, the R language, the GUI of R, or even the underlying operating system) may contain buggy code that allows arbitrary code execution. Using plain-text files or preventing filesystem access does not protect you from that. When reading a plain-text file, the parser that processes the contents of that plain-text file may contain such a bug. For example, here's a recent security advisory for a bug in a CSV parser that can cause denial-of-service. Is it a problem? Yes, it is. Does it mean that you should stop using CSV files? I don't think so, unless you are constantly working with untrusted datasets from third-party sources that have a high chance of containing maliciously crafted data.

ntamas avatar Jul 04 '22 11:07 ntamas

It seems that the discussion got a bit derailed, and now it's difficult to tell when this issue should be considered solved.

As I understand, the original request was to be able to write/read R/igraph graphs with no loss of data into a plain-text format. This is accomplished by saveRDS / readRDS.

Arguments against extending read_graph and write_graph with an and RDS-based format (which is the only reasonable way to store R/igraph graphs losslessly, as far as I'm aware):

  • Inventing a "new" format that is not really new, just duplicates RDS for the specific case of graphs.
  • All other formats of read_graph and write_graph are implemented in C and are shared across all interfaces of igraph. This would be specific to R only. Thus it doesn't fit into read_graph and write_graph
  • There is already a way to do this with saveRDS / readRDS.

Thus I'm closing this issue. The discussion may continue regardless, but comments that are not directly related are better on the forum (or as a new issue, if it is a bug report or feature request).

szhorvat avatar Jul 04 '22 11:07 szhorvat

Sorry, what I wrote above applied to closing #544.

Here the request was to read from strings, similar to what read.csv can do with its text argument. Personally, I'm not opposed to that, but let's keep the discussion on-topic so it doesn't diverge too much from the main point.

szhorvat avatar Jul 04 '22 11:07 szhorvat

Consider the following graphml definition:

s1 = r"---(<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
         http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<!-- Created by igraph -->
  <key id="g_name" for="graph" attr.name="name" attr.type="string"/>
  <key id="g_mode" for="graph" attr.name="mode" attr.type="string"/>
  <key id="g_center" for="graph" attr.name="center" attr.type="double"/>
  <graph id="G" edgedefault="directed">
    <data key="g_name">In-star</data>
    <data key="g_mode">in</data>
    <data key="g_center">1</data>
    <node id="n0">
    </node>
    <node id="n1">
    </node>
    <edge source="n1" target="n0">
    </edge>
  </graph>
</graphml>
)---"

As an alternative to

write_graph(g1, text=s1, format="graphml"),

we can do the following:

g2 <- read_graph(rawConnection(charToRaw(s1)), format="graphml") # as suggested by ntamas

isomorphic(g1, g2) # both graph are isomorphic

clpippel avatar Jul 04 '22 19:07 clpippel

Write.table, read.table use character connections. Therefore write.table(mtcars, stdout()) works as expected.

However write_graph(g, file=stdout()) returns: Error in writeBin(buffer, file): Can only write to a binary connection

For the same reason

read_graph(textConnection(s1), format="edgelist", directional=FALSE) failed because

Error in readBin(filename, what = raw(0), n = tmpbufsize) :
      can only read from a binary connection

A solution could be to enable reading characters.

I have amended my proposal for more clarity.

clpippel avatar Jul 05 '22 08:07 clpippel

Targeting this for a hackathon where we can investigate (with the help of some R experts) whether supporting this would be easy to do in the next release or not.

ntamas avatar Nov 10 '22 22:11 ntamas

Can you please confirm my understanding of the issue and the linked issue #544:

We're looking for a way to convert an igraph object to text form so that it can be read back in and yield an identical object. Ideally, this would not make use of the file system at all.

I have questions:

  • How important is the exact roundtrip, e.g., the retention of all node and edge attributes?
  • I don't understand the implementation of, e.g., R_igraph_write_graph_graphml() for the case HAVE_OPEN_MEMSTREAM == 1 . It seems to return a raw vector and never write to the file given by the file argument?

The constructive R package at https://github.com/cynkra/constructive/ provides a framework to support idiomatic construction of arbitrary objects. We could implement methods for generics provided by that package, and this would allow converting any graph object into equivalent R code that yields this graph object when executed. I believe this would be an advantage over inventing a new serialization format.

krlmlr avatar Dec 14 '22 15:12 krlmlr

What I suggested here is not what you describe.

Instead, I suggested to be able to import/export from/to a string instead of a file. This would apply to all supported non-binary formats. E.g. you have the contents of a GML file in a string, and want to read it directly.

szhorvat avatar Dec 14 '22 16:12 szhorvat

don't understand the implementation of, e.g., R_igraph_write_graph_graphml() for the case HAVE_OPEN_MEMSTREAM == 1 . It seems to return a raw vector and never write to the file given by the file argument?

I don't understand this part either. It seems like for Windows, HAVE_OPEN_MEMSTREAM is always defined to zero, and it does not seem to be defined for Linux or macOS at all (if it were defined somewhere, it would probably have to appear in src/config.h, but it does not appear there). It could be a remainder of historical code and the entire HAVE_OPEN_MEMSTREAM part could in theory be removed if we don't need it.

ntamas avatar Dec 14 '22 16:12 ntamas

However, it might be the case that we could actually use this code, at least on Linux and macOS (both have open_memstream()). The idea would be that if file as a SEXP is not the name of a file but an R connection object, we could write the graph into a temporary memory area created with open_memstream() and then pipe the contents of that memory area into the R connection object. But needless to say, the current implementation with all the copy-pasted code is not sustainable in the long run.

ntamas avatar Dec 14 '22 16:12 ntamas