gexf icon indicating copy to clipboard operation
gexf copied to clipboard

Make sure NetworkX can properly read/write GEXF 1.3

Open mbastian opened this issue 2 years ago • 7 comments

NetworkX is the goto Python library for graph manipulation and already has a GEXF import/export.

Definition of done

  • Import/Export are fully 1.3 compatible (based on what is possible given networkx internals)

mbastian avatar Aug 28 '22 07:08 mbastian

@mbastian looks like networkx can't do lists of floats as a attributes, which were added in 1.3.

rjurney avatar Sep 24 '23 19:09 rjurney

@mbastian to be specific about my use case... in reading the code, it looks like Gephi's format assumes that lists of properties are time series. My use case is that I want to store a 384-dimension embedding from a paraphrase embedding of a citation graph's node properties on the nodes and do analysis in NetworkX and then also use this GEXF file in Deep Graph Library (DGL) and PyG aka PyTorch Geometric.

Dataset: https://snap.stanford.edu/data/cit-HepTh.html Embedding: https://huggingface.co/sentence-transformers/paraphrase-MiniLM-L6-v2 Sentence Transformers: https://www.sbert.net/

Example code below JSONizes the embedding list of floats to make things go, but I'd like to be able to store it. @mbastian Can you make GEXF support embeddings moving forward in the next version?

# Embed the abstracts for GNN features. Embedding is a generic approach for retrieval as well.
# Note: NetworkX can't save lists in GEXF format, so we'll JSONize the list & save the embeddings separately.
embedded_abstracts: np.ndarray = None
if os.path.exists("data/embedded_abstracts.npy"):
    embedded_abstracts = np.load("data/embedded_abstracts.npy")
else:
    embedded_abstracts = embed_paper_info(all_abstracts, convert_to_tensor=False)
    np.save("data/embedded_abstracts.npy", embedded_abstracts)

for paper_id, emb in zip(file_paper_ids, embedded_abstracts):
    assert emb.shape == (384,)

    # Gephi assumes a list of floats is a time series, so we need to convert to a string
    G.nodes[file_to_networkx_ids[paper_id]]["Embedding-JSON"] = json.dumps(emb.tolist())

Example document:

------------------------------------------------------------------------------
\\
Paper: hep-th/0001001
From: Paul S. Aspinwall <[email protected]>
Date: Sat, 1 Jan 2000 00:02:31 GMT   (84kb)
Date (revised v2): Mon, 17 Jan 2000 14:52:43 GMT   (85kb)

Title: Compactification, Geometry and Duality: N=2
Authors: Paul S. Aspinwall
Comments: 82 pages, 8 figures, LaTeX2e, TASI99, refs added and some typos fixed
Report-no: DUKE-CGTP-00-01
\\
  These are notes based on lectures given at TASI99. We review the geometry of
the moduli space of N=2 theories in four dimensions from the point of view of
superstring compactification. The cases of a type IIA or type IIB string
compactified on a Calabi-Yau threefold and the heterotic string compactified on
K3xT2 are each considered in detail. We pay specific attention to the
differences between N=2 theories and N>2 theories. The moduli spaces of vector
multiplets and the moduli spaces of hypermultiplets are reviewed. In the case
of hypermultiplets this review is limited by the poor state of our current
understanding. Some peculiarities such as ``mixed instantons'' and the
non-existence of a universal hypermultiplet are discussed.
\\

Its embedding:

[-0.5083363652229309, -0.35725411772727966, 0.1389939785003662, -0.1347253918647766, -0.1535784900188446, 0.43154388666152954, 0.15374013781547546, -0.008106844499707222, -0.1662866771221161, -0.15766437351703644, 0.35521116852760315, 0.15607962012290955, 0.6218618750572205, 0.07288412749767303, -0.08790934085845947, -0.145784392952919, 0.14549043774604797, -0.03458674997091293, -0.741215705871582, 0.019919676706194878, -0.2773298919200897, -0.16332964599132538, -0.42131808400154114, 0.06080969050526619, 0.55726158618927, 0.18690286576747894, -0.19952552020549774, 0.23189248144626617, 0.39608946442604065, 0.031538791954517365, 0.4129146337509155, 0.37623560428619385, 0.16398969292640686, 0.09904278814792633, 0.5887687802314758, 0.19061870872974396, -0.020812658593058586, 0.6324356198310852, 0.005971217527985573, 0.2787822186946869, 0.20738601684570312, -1.136680006980896, 0.4140499532222748, 0.7376874685287476, 0.26450657844543457, 0.08141785860061646, -0.529627799987793, -0.07897279411554337, 0.302225261926651, 0.26963791251182556, -0.5572066307067871, 0.022079501301050186, -0.41076093912124634, -0.16617120802402496, -0.014963116496801376, 0.2403220683336258, 0.03146751970052719, -0.514580488204956, 0.02357768639922142, -0.19823256134986877, -0.1633021980524063, 0.14651842415332794, -0.5526030659675598, 0.5041884183883667, 0.20464496314525604, 0.16364993155002594, -0.0379401370882988, -0.16234970092773438, 0.273735910654068, 0.4701267182826996, 0.38202783465385437, 0.6249184608459473, -0.6957732439041138, -0.4264785051345825, 0.06444322317838669, 0.6805640459060669, -0.3116794228553772, 0.009198327548801899, -0.18131123483181, -0.4511978328227997, 0.2052099108695984, -0.7076764106750488, -0.2577372193336487, -0.11397387087345123, 0.004945039749145508, 0.29662612080574036, 0.48335978388786316, 0.16308338940143585, 0.02071310393512249, -0.06133018806576729, 0.3547375500202179, -0.015222515910863876, -0.3296150863170624, 0.27946799993515015, 0.10797177255153656, 0.5158742070198059, 0.3182218670845032, -0.1535983383655548, 0.6189644932746887, 0.16411934792995453, -0.20841538906097412, -0.09344162046909332, -0.5550981760025024, -0.0629420131444931, -0.5624946355819702, -0.6402942538261414, -0.201442688703537, 0.18017089366912842, 0.27435120940208435, 0.18869590759277344, 0.04372529685497284, -0.3697742521762848, -0.06247770041227341, 0.14726705849170685, -0.5059475302696228, 0.17057615518569946, 0.49116864800453186, 0.303863525390625, 0.7109688520431519, -0.08683305978775024, 0.4489392042160034, 0.8849781155586243, 0.2691556513309479, 0.054163508117198944, 0.20481964945793152, -0.047171857208013535, 0.49669820070266724, 0.3995380997657776, -0.2686813771724701, -0.1840616762638092, -0.03536504507064819, -0.6438066959381104, 0.0884658545255661, -0.049895793199539185, 0.1340586543083191, 0.008303023874759674, 0.12762904167175293, 0.19640912115573883, 0.09768808633089066, -0.17605964839458466, 0.03801923617720604, 0.22554127871990204, -0.0682666227221489, -0.21554642915725708, 0.34073975682258606, -0.1460971236228943, -0.6941462755203247, 0.20569857954978943, 0.5059947967529297, -0.3478425145149231, -0.13772228360176086, -0.06816817820072174, -0.5381731390953064, 0.05074828490614891, 0.06547494232654572, -0.29076358675956726, -0.15378691256046295, 0.2487240433692932, 0.3956683874130249, 0.28119516372680664, -0.36075934767723083, -0.13970033824443817, 0.3972870111465454, 0.24897192418575287, 0.39377814531326294, 0.28017812967300415, 0.5327494740486145, -0.4372592270374298, -0.33479222655296326, 0.06613282114267349, 0.4145204424858093, -0.09375417977571487, 0.006537675857543945, 0.44525378942489624, 0.03501797467470169, -0.2608524560928345, -0.006014466285705566, -0.036333389580249786, -0.537621796131134, 0.18642160296440125, 0.07950431853532791, -0.2662293016910553, -0.24478109180927277, -0.5388363003730774, 0.0674142986536026, 0.006562564522027969, 0.13258269429206848, 0.43928781151771545, 0.14479145407676697, -0.6222834587097168, -0.33258986473083496, -0.6179389357566833, -0.2406272441148758, 0.014090614393353462, -0.3714263439178467, -0.412462443113327, 0.27592408657073975, 0.0349738746881485, -0.2271711528301239, 0.5821718573570251, -0.36073049902915955, -0.2708200216293335, 0.20686064660549164, -0.23197627067565918, 0.042743708938360214, 0.14470048248767853, -0.024556558579206467, -0.6748477816581726, -0.16571849584579468, 0.20108835399150848, -0.07298190146684647, -0.5514233112335205, -0.06006268784403801, -0.04524163901805878, 0.012701082974672318, 0.41854313015937805, -0.23032033443450928, -0.7118092179298401, -0.3731357455253601, -0.038922086358070374, 0.11315789818763733, -0.19573336839675903, 0.5248740911483765, -0.8068038821220398, -0.3490540087223053, 0.6316984295845032, -0.24007821083068848, 0.19816532731056213, 0.02993026375770569, -0.09062369167804718, 0.32186055183410645, 0.41794851422309875, 0.504360556602478, 0.1191108375787735, 0.3482481837272644, 0.15071724355220795, 0.05511059984564781, -0.14041967689990997, 0.18092676997184753, 0.02112441509962082, 0.1610906720161438, 0.03389054536819458, -0.15241602063179016, -0.1575293093919754, -0.12149085104465485, 0.5990638136863708, -0.7717245817184448, -0.04483901336789131, 0.19884341955184937, 0.10792878270149231, 0.10256698727607727, -0.5565033555030823, 0.029021425172686577, 0.16152621805667877, 0.3552182912826538, -0.19814762473106384, 0.19467827677726746, -0.1417803019285202, -0.4221956431865692, 0.29962822794914246, 0.6577330827713013, 0.17069461941719055, 0.28435853123664856, 0.21476049721240997, 0.8059138059616089, -0.048171523958444595, -0.16125980019569397, -0.07039059698581696, -0.09816092252731323, -0.1514281928539276, 0.24609962105751038, -0.0849226862192154, 0.09835521876811981, 0.32943952083587646, -0.25816798210144043, -0.06863641738891602, 0.049438249319791794, 0.025209199637174606, 0.08355040848255157, 0.21580441296100616, -0.41988956928253174, 0.07675647735595703, -0.14934852719306946, -0.4311261475086212, -0.3233030140399933, -0.19432544708251953, 0.09847439080476761, -0.24860693514347076, 0.1917468160390854, -0.04119320958852768, 0.036722056567668915, -0.21387654542922974, -0.0030690915882587433, -0.13641610741615295, 0.012929495424032211, 0.3078806400299072, -0.34233883023262024, 0.045709915459156036, 0.11729196459054947, 0.13548825681209564, -0.3334689736366272, 0.29789718985557556, 0.12125445902347565, 0.13667646050453186, -0.6150417327880859, 0.0011353977024555206, -0.012479695491492748, 0.2989681363105774, 0.3227967321872711, -0.052288718521595, 0.3666779100894928, -0.2939664423465729, 0.12823599576950073, -0.10072129964828491, -0.176337331533432, 0.2739074230194092, -0.26633912324905396, 0.43988385796546936, -0.09746330976486206, -0.2637675702571869, 0.02734220400452614, -0.20562905073165894, -0.6480699777603149, 0.1781962364912033, 0.17634740471839905, -0.07000317424535751, 0.3828813135623932, -0.6547756195068359, 0.15146368741989136, 0.03579747676849365, -0.007166197523474693, 0.15733617544174194, 0.046128399670124054, -0.7098756432533264, 0.22380834817886353, 0.3733425438404083, -0.7145859003067017, 0.18655464053153992, -0.4990553557872772, -0.2336399257183075, -0.3922877907752991, -0.12291472405195236, 0.3854149878025055, -0.3202831447124481, -0.0007252912037074566, 0.34592050313949585, -0.07235311716794968, 0.5941299796104431, -0.04594670981168747, -0.10191763192415237, 0.15881231427192688, 0.38152000308036804, 0.4613525867462158, 0.07394368201494217, -0.031655725091695786, -0.1491849571466446, -0.4769206941127777, 0.11919506639242172, 0.52707439661026, 0.12066393345594406, -0.3855656683444977, 0.0897144302725792, -0.015513844788074493, 0.8330134153366089, 0.44915086030960083, 0.07939314842224121, -0.387637197971344, 0.21580561995506287, 0.18721160292625427, -0.3700406849384308, -0.1043381541967392, 0.19310817122459412, 0.116238072514534, -0.40746667981147766, 0.7291035056114197, -0.43795716762542725, 0.22398078441619873, -0.24590949714183807, -0.06679191440343857, -0.5940830111503601, -0.018695345148444176, -0.33444738388061523, -0.09381847828626633, 0.18644794821739197]

rjurney avatar Sep 25 '23 03:09 rjurney

Oh uh, supporting embeddings in Gephi is going to be essential to keeping it relevant as graph AI and visualization merge and computing becomes more GPU-centric.

rjurney avatar Sep 25 '23 03:09 rjurney

Thanks @rjurney, I would be happy to chat about what would make it easier to handle embeddings in Gephi. The GEXF format supports float lists so if it's not properly imported in Gephi it must be a bug. Compatibility with NetworkX is surely also important. Let me investigate, I bet that we don't have much unit tests around lists import as it hasn't been super popular in the past.

mbastian avatar Sep 30 '23 12:09 mbastian

Cool, I will share my notebook with you so you can see. It isn't open source at this ppl t but I trust you ;)

Another issue is that integer node IDs become strings. I have had to cast them back to integers. A lot of Python tools around networkx like littleballoffur (graph sampling) and karateclub (graph embeddings) won't work without integer node IDs.

rjurney avatar Oct 09 '23 04:10 rjurney

What was the result of this discussion?

Deanozk avatar Jun 15 '24 16:06 Deanozk

also does pickle file format have same issue or not?

Deanozk avatar Jun 15 '24 16:06 Deanozk