Be able to read and write to strings in read_igraph, write_igraph
What is the feature or improvement you would like to see?
Be able to read and write to strings in read_graph(), write_graph()
For example: Create graph.
g <- make_ring(10)
write_graph(g) # output graph to the console, e.g. stdout()
write_graph(g, file="") # output graph to the console, e.g. stdout()
~~write_graph(g, text = s) # output graph to string~~ And read back.
library(igraph)
g <- make_ring(10)
s1 = r"---(0 1 1 2 2 3 3 4 4 5 5 6 6 7 7 8 8 9 0 9)---"
# g1 <- read_graph(text=s1, format="edgelist", directed=FALSE) # proposed
g1 <- read_graph(rawConnection(charToRaw(s1)), format="edgelist", directed=FALSE) # work around as suggested by ntamas
isomorphic(g, g1)
From the R documentation of write.table:
text | character string: if file is not supplied and this is, then data are read from the value of text via a text connection. Notice that a literal string can be used to include (small) data sets within R code.
The current implementation of write/read_graph uses binary connections. Therefore, the above construction fails. One idea is to add ascii =TRUE as in saveRDS.
Use cases for the feature
- Avoids interaction with the file system.
- Makes it easier to create standalone code samples.
- Allows moving graph objects across platforms using cut and paste and plain text files.
References See also read.csv, write.csv
There are already several formats that encode the graph in a textual human-readable form that can be safely copied and pasted. In fact, all formats supported by write_graph are textual. read_graph supports GraphDB which is a binary format.
I am not really an R user, but something along these lines should work when the graph data is in a string:
read_graph(rawConnection(charToRaw('graph [ directed 0 node [ id 1 ] node [ id 2 ] edge [ source 1 target 2 ] ]')), 'gml')
For some reason textConnection does not work here.
Does this work for your purposes?
Everything is possible with a Turing machine. The proposed solution is not entirely intuitive.
I'm looking for something like s <- toString(g1), copy paste string s, and g2 <- fromString(s). My preference is to move objects (e.g. for bug reporting) to another environment using strings, rather than more complicated objects that are more prone to virus infection. Or may have other unwanted side effects. I also try to avoid the file system if possible.
I am a recreational user. I want to keep it as simple as possible.
igraph's core is written in C, and eventually all read_graph() and write_graph() functions end up in C, where we need to pass a FILE* object to the underlying C code. I guess that textConnection() does not provide a FILE* object so we cannot pass anything to the C layer that would make sense there. This is a limitation of igraph not only for R but also for other higher level interfaces like Python. I'm afraid that the only way to solve this transparently is with a helper function that saves the graph to a temporary file, then reads back the temporary file into a string and throws away the file itself.
The functions read.csv(), write.csv() output to a string with text = s. It is not a new functionality.
I certainly see the appeal, as I use similar functionality all the time from Mathematica. Can you elaborate on how this works with write.csv? read.csv has a text argument but how do you get a string from write.csv?
Also, if you are using Windows, please test the suggestion I made and report back about whether it worked on that platform.
Try:
g <- make_ring(10)
write.csv(str(g))
If file = is omitted my guess is that the output is send to the standard output, e.g. the console (in windows Rgui).
I have tried:
read_graph(rawConnection(charToRaw('graph [ directed 0 node [ id 1 ] node [ id 2 ] edge [ source 1 target 2 ] ]')), 'gml')
The result is:
IGRAPH a0fdbdb U--- 2 1 --
+ attr: id (v/n)
+ edge from a0fdbdb:
[1] 1--2
It shows only one edge. The full output is:
IGRAPH df2cb86 U--- 10 10 -- Ring graph
+ attr: name (g/c), mutual (g/l), circular (g/l)
+ edges from df2cb86:
[1] 1-- 2 2-- 3 3-- 4 4-- 5 5-- 6 6-- 7 7-- 8 8-- 9 9--10 1--10
sessionInfo
R version 4.1.3 (2022-03-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 22000)
Eventually I managed to read and write a graph using the clipboard.
library(igraph)
g1 <- sample_gnm(100, 50)
object.size(g1)
saveRDS(g1, file = "clipboard-128", ascii = TRUE, version = NULL, compress = FALSE, refhook = NULL)
g2 <- readRDS(file("clipboard-128", "r"), refhook = NULL)
identical_graphs(g1, g2)
5184 bytes [1] TRUE
Relevant documentation can be found in: [https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/connections] or type ?connections in the R console.
Suppose my machine has been damaged by a virus. Is it possible that the virus is spreading through the workspace file? This is why I prefer plain text over files I don't know.
Unfortunately trying :
library(igraph)
g1 <- sample_gnm(100, 50)
object.size(g1)
saveRDS(g1, file = "clipboard-128", ascii = TRUE, version = NULL, compress = FALSE, refhook = NULL)
Now I cut and paste the clipboard and after
g2 <- readRDS(file("clipboard-128", "r"), refhook = NULL)
I receive the error message
Error in readRDS(file("clipboard", "r"), refhook = NULL) :
unknown input format
It feels rather shaky. It works. I did something stupid: I copied the readRDS through the clipboard destroying the original content...
Is it possible that the virus is spreading through the workspace file?
I find that highly unlikely.
I thought along the following lines. What if the workspace contains code with a bug that results in arbitrary code execution e.g. a crash as in the permute example. Then I could put malicious code in data structures and such that the code is executed. It doesn't seem easy to implement, but not impossible.
Representatives of institutions are attractive victims. In this way an institution can be taken hostage.
I certainly see the appeal, as I use similar functionality all the time from Mathematica. Can you elaborate on how this works with write.csv? read.csv has a text argument but how do you get a string from write.csv?
Consider a dataframe to be transferred.
# step 1
df1 <- mtcars[1:5, ]
write.csv(df1)
The output is copied into the receiving end as follows:
# step 2, use a raw string to input tricky text
s1 <- r"---(
"","mpg","cyl","disp","hp","drat","wt","qsec","vs","am","gear","carb"
"Mazda RX4",21,6,160,110,3.9,2.62,16.46,0,1,4,4
"Mazda RX4 Wag",21,6,160,110,3.9,2.875,17.02,0,1,4,4
"Datsun 710",22.8,4,108,93,3.85,2.32,18.61,1,1,4,1
"Hornet 4 Drive",21.4,6,258,110,3.08,3.215,19.44,1,0,3,1
"Hornet Sportabout",18.7,8,360,175,3.15,3.44,17.02,0,0,3,2
)---"
df2 <- read.csv(text=s1, row.names=1)
identical(df1, df2)
identical(toString(df1), toString(df2))
On my machine df1 and df2 are not identical. (?) However, their string representations are.
This allows for small stand alone programming without relying on a file system.
The results of saveRDS are indeed different:
saveRDS(df1, file = "clipboard-128", ascii = TRUE, version = NULL, compress = FALSE, refhook = NULL)
saveRDS(df2, file = "clipboard-128", ascii = TRUE, version = NULL, compress = FALSE, refhook = NULL)
What if the workspace contains code with a bug that results in arbitrary code execution
Any other code on your machine (including igraph, the R language, the GUI of R, or even the underlying operating system) may contain buggy code that allows arbitrary code execution. Using plain-text files or preventing filesystem access does not protect you from that. When reading a plain-text file, the parser that processes the contents of that plain-text file may contain such a bug. For example, here's a recent security advisory for a bug in a CSV parser that can cause denial-of-service. Is it a problem? Yes, it is. Does it mean that you should stop using CSV files? I don't think so, unless you are constantly working with untrusted datasets from third-party sources that have a high chance of containing maliciously crafted data.
It seems that the discussion got a bit derailed, and now it's difficult to tell when this issue should be considered solved.
As I understand, the original request was to be able to write/read R/igraph graphs with no loss of data into a plain-text format. This is accomplished by saveRDS / readRDS.
Arguments against extending read_graph and write_graph with an and RDS-based format (which is the only reasonable way to store R/igraph graphs losslessly, as far as I'm aware):
- Inventing a "new" format that is not really new, just duplicates RDS for the specific case of graphs.
- All other formats of
read_graphandwrite_graphare implemented in C and are shared across all interfaces of igraph. This would be specific to R only. Thus it doesn't fit intoread_graphandwrite_graph - There is already a way to do this with
saveRDS/readRDS.
Thus I'm closing this issue. The discussion may continue regardless, but comments that are not directly related are better on the forum (or as a new issue, if it is a bug report or feature request).
Sorry, what I wrote above applied to closing #544.
Here the request was to read from strings, similar to what read.csv can do with its text argument. Personally, I'm not opposed to that, but let's keep the discussion on-topic so it doesn't diverge too much from the main point.
Consider the following graphml definition:
s1 = r"---(<?xml version="1.0" encoding="UTF-8"?>
<graphml xmlns="http://graphml.graphdrawing.org/xmlns"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
<!-- Created by igraph -->
<key id="g_name" for="graph" attr.name="name" attr.type="string"/>
<key id="g_mode" for="graph" attr.name="mode" attr.type="string"/>
<key id="g_center" for="graph" attr.name="center" attr.type="double"/>
<graph id="G" edgedefault="directed">
<data key="g_name">In-star</data>
<data key="g_mode">in</data>
<data key="g_center">1</data>
<node id="n0">
</node>
<node id="n1">
</node>
<edge source="n1" target="n0">
</edge>
</graph>
</graphml>
)---"
As an alternative to
write_graph(g1, text=s1, format="graphml"),
we can do the following:
g2 <- read_graph(rawConnection(charToRaw(s1)), format="graphml") # as suggested by ntamas
isomorphic(g1, g2) # both graph are isomorphic
Write.table, read.table use character connections. Therefore write.table(mtcars, stdout()) works as expected.
However write_graph(g, file=stdout()) returns:
Error in writeBin(buffer, file): Can only write to a binary connection
For the same reason
read_graph(textConnection(s1), format="edgelist", directional=FALSE) failed because
Error in readBin(filename, what = raw(0), n = tmpbufsize) :
can only read from a binary connection
A solution could be to enable reading characters.
I have amended my proposal for more clarity.
Targeting this for a hackathon where we can investigate (with the help of some R experts) whether supporting this would be easy to do in the next release or not.
Can you please confirm my understanding of the issue and the linked issue #544:
We're looking for a way to convert an igraph object to text form so that it can be read back in and yield an identical object. Ideally, this would not make use of the file system at all.
I have questions:
- How important is the exact roundtrip, e.g., the retention of all node and edge attributes?
- I don't understand the implementation of, e.g.,
R_igraph_write_graph_graphml()for the caseHAVE_OPEN_MEMSTREAM == 1. It seems to return a raw vector and never write to the file given by thefileargument?
The constructive R package at https://github.com/cynkra/constructive/ provides a framework to support idiomatic construction of arbitrary objects. We could implement methods for generics provided by that package, and this would allow converting any graph object into equivalent R code that yields this graph object when executed. I believe this would be an advantage over inventing a new serialization format.
What I suggested here is not what you describe.
Instead, I suggested to be able to import/export from/to a string instead of a file. This would apply to all supported non-binary formats. E.g. you have the contents of a GML file in a string, and want to read it directly.
don't understand the implementation of, e.g., R_igraph_write_graph_graphml() for the case HAVE_OPEN_MEMSTREAM == 1 . It seems to return a raw vector and never write to the file given by the file argument?
I don't understand this part either. It seems like for Windows, HAVE_OPEN_MEMSTREAM is always defined to zero, and it does not seem to be defined for Linux or macOS at all (if it were defined somewhere, it would probably have to appear in src/config.h, but it does not appear there). It could be a remainder of historical code and the entire HAVE_OPEN_MEMSTREAM part could in theory be removed if we don't need it.
However, it might be the case that we could actually use this code, at least on Linux and macOS (both have open_memstream()). The idea would be that if file as a SEXP is not the name of a file but an R connection object, we could write the graph into a temporary memory area created with open_memstream() and then pipe the contents of that memory area into the R connection object. But needless to say, the current implementation with all the copy-pasted code is not sustainable in the long run.