python-igraph icon indicating copy to clipboard operation
python-igraph copied to clipboard

PageRank Error: Fatal error at src/core/vector.c:126 : Assertion failed: size >= 0

Open jmzhoulab opened this issue 3 years ago • 21 comments

Describe the bug Got an error when running PageRank algorithm on a scale graph. The number of edge is 2658026604, the size of edge file is 51G, and the edge file format as follow

woker@robot:benchmark_datasets$ head datagen-scale_1X.txt 
102309412 0
13241850 1
39897563 2
60103142 3
68395754 3
56599636 5
10025524 5
13963849 5
10428790 5
93483924 5

The machine configuration is that cpu cores is 32 and memory is 320G.

Here is the error message.

Fatal error at src/core/vector.c:126 : Assertion failed: size >= 0

The test code is

import sys
import os
import time


import igraph
from igraph import *


if __name__ == '__main__':

    data_file = "datagen-scale_1X.txt"

    print("begin load graph...")
	
    s_time = time.time()
    g = igraph.read(data_file, format="edge", directed=True)

    print(f"vertex count {g.vcount()}, edge count {g.ecount()}")
    print(f"finished load graph, cost: {time.time() - s_time}")

    print("====== start pagerank =======")

    s_time = time.time()
    res = g.pagerank(niter=10, eps=1e-6)

    print(f"res count: {len(res)}")

    print(res[0:10])

    print(f"algorithm cost: {time.time() - s_time}")

Version information python 3.7.12 igraph 0.9.10

jmzhoulab avatar Aug 05 '22 02:08 jmzhoulab

Load graph from file is ok, the error arise out of PageRank algorithm.

jmzhoulab avatar Aug 05 '22 02:08 jmzhoulab

Version 0.9 does not support such huge graphs. 0.10 will. Please try with the development version and let us know if it works: https://github.com/igraph/python-igraph/#compiling-the-development-version

szhorvat avatar Aug 05 '22 05:08 szhorvat

Also, do the expected vertex and edge counts match? I'm asking because it looks like your input file might have gaps in the IDs. igraph requires the numeric vertex IDs to be consecutive integers, so if you have an edge list that contains a single edge like 1 1000000, you'll get 1000001 vertices, out of which 999999 will be isolates. If you have gaps in the IDs, load the file as "ncol", not edgelist, as "ncol" will renumber the vertices and assign the original IDs as a vertex attribute.

ntamas avatar Aug 05 '22 08:08 ntamas

2658026604 is just over the 31-bit limit:

In[44]:= Log2[2658026604] // N
Out[44]= 31.3077

igraph_pagerank() can currently use one of two methods: PRPACK and ARPACK.

PRPACK requires an edge count less than 2^31. The expected behaviour in 0.10 is to throw an error. We might be able to patch PRPACK to work with larger graphs, but this will be quite some work and it probably won't happen in the short term.

ARPACK requires a vertex count less than 2^31. It is unclear if it also requires an edge count smaller than this—I expect not. It might work. Be sure to use the NCOL reader as Tamás said to ensure that you do not get unnecessary vertices. The tracking issue for supporting large arrays with BLAS/LAPACK/ARPACK is here and is again not a short-term project.

If you could test both methods in 0.10 (i.e. develop branch) and report back that would be much appreciated.

szhorvat avatar Aug 05 '22 09:08 szhorvat

2658026604 is just over the 31-bit limit:

In[44]:= Log2[2658026604] // N
Out[44]= 31.3077

igraph_pagerank() can currently use one of two methods: PRPACK and ARPACK.

PRPACK requires an edge count less than 2^31. The expected behaviour in 0.10 is to throw an error. We might be able to patch PRPACK to work with larger graphs, but this will be quite some work and it probably won't happen in the short term.

ARPACK requires a vertex count less than 2^31. It is unclear if it also requires an edge count smaller than this—I expect not. It might work. Be sure to use the NCOL reader as Tamás said to ensure that you do not get unnecessary vertices. The tracking issue for supporting large arrays with BLAS/LAPACK/ARPACK is here and is again not a short-term project.

If you could test both methods in 0.10 (i.e. develop branch) and report back that would be much appreciated.


@szhorvat THX, I have some confusion that why the edge count less than 2^31. If it's using 32 bits unsigned integer as the index of vertor in storage, the max edge count is 2^32 (about 41 billion), but why limit 2^31 (about 21 billion). Is the vector index used 32 bits signed integer?

jmzhoulab avatar Aug 08 '22 03:08 jmzhoulab

Also, do the expected vertex and edge counts match? I'm asking because it looks like your input file might have gaps in the IDs. igraph requires the numeric vertex IDs to be consecutive integers, so if you have an edge list that contains a single edge like 1 1000000, you'll get 1000001 vertices, out of which 999999 will be isolates. If you have gaps in the IDs, load the file as "ncol", not edgelist, as "ncol" will renumber the vertices and assign the original IDs as a vertex attribute.


@ntamas Thx, I got it, The max vertex id is 1682052736(less than 2^31) in my graph, But I'm not sure there are any isolated nodes. I try to use load the file as "ncol" and run agian.

jmzhoulab avatar Aug 08 '22 03:08 jmzhoulab

Yes, it uses signed integers. Were you able to test with the development version and see if the ARPACK method works?

szhorvat avatar Aug 08 '22 06:08 szhorvat

It's a pity, when I tried to rebuild the develop version and run pagerank algorithm in these days, which get a "Too many edges for PRPACK" error.

The develop version use the PRPACK, but I don't konwn how to use ARPACK method.

The detail error message

Traceback (most recent call last):
File "igraph/page_rank.py", line 29, in <module>
res = g.pagerank(niter=10, eps=1e-6)
File "/opt/conda/lib/python3.7/site-packages/igraph-0.9.11-py3.7-linux-x86_64.egg/igraph/structural.py", line 94, in _pagerank
eps,
igraph._igraph.InternalError: Error at src/centrality/prpack.cpp:46: Too many edges for PRPACK. -- Failed

jmzhoulab avatar Aug 10 '22 07:08 jmzhoulab

Use implementation="arpack"

https://igraph.org/python/api/latest/igraph.Graph.html#pagerank

szhorvat avatar Aug 10 '22 07:08 szhorvat

Using implementation="arpack" raise an "Segmentation fault" exception and there is no stack information.

jmzhoulab avatar Aug 10 '22 10:08 jmzhoulab

Can you please double-check that your graph has strictly fewer vertices than $2^{31}$? Can you please report the vertex and edge counts here? Is the graph weighted?

szhorvat avatar Aug 10 '22 11:08 szhorvat

I looked at the code and I do not see any reason why this should fail, but I do not currently have access to a machine with enough memory to test it. I added a safety check for the vertex count. If you re-install the development version now, it should include that check. However, if your graph has fewer than $2^{31}$ vertices, it should make no difference.

Any tips here @ntamas ? I can't see anything wrong in the Python/C glue code either.

szhorvat avatar Aug 10 '22 12:08 szhorvat

Ok, I will check it again and give you feedback later.

jmzhoulab avatar Aug 10 '22 15:08 jmzhoulab

The maximum vertex id is 434943375,which far less than $2^{31}$. The graph is unweighted. Read data from the file is ok. The graph object created successfully.

vertex count 434943376, edge count 2658026604

I rebuild the python-igraph development version using igraph-4d2a03c30d303f6e6b9a64c7bf0c334b856c3c8c. This igraph version is work?

The test code

import sys
import os
import time

sys.path.append(os.path.abspath(os.path.dirname(__file__)+'/'+'..'))

import igraph
from igraph import *
from common.time import Timer
from common.logger import logger


if __name__ == '__main__':
    timer = Timer()

    data_file = sys.argv[1]

    logger.info("begin load graph...")

    g = igraph.read(data_file, format="edge", directed=True)

    logger.info(f"vertex count {g.vcount()}, edge count {g.ecount()}")
    logger.info(f"finied load graph, cost: {timer.cost()}")
    logger.info(f"build graph cost: {timer.cost()}")

    logger.info("====== start pagerank =======")

    alg_timer = Timer()
    res = g.pagerank(implementation="arpack", niter=10, eps=1e-6)

    logger.info(f"res count: {len(res)}")

    logger.info(res[0:10])

    logger.info(f"algorithm cost: {alg_timer.cost()}")

The stdout log

2022-08-11 04:50:44,272 [INFO] page_rank.py:18 - begin load graph...
2022-08-11 05:43:43,548 [INFO] page_rank.py:22 - vertex count 434943376, edge count 2658026604
2022-08-11 05:43:43,548 [INFO] page_rank.py:23 - finied load graph, cost: 3179.28
2022-08-11 05:43:43,548 [INFO] page_rank.py:24 - build graph cost: 3179.28
2022-08-11 05:43:43,549 [INFO] page_rank.py:26 - ====== start pagerank =======
Segmentation fault

jmzhoulab avatar Aug 11 '22 06:08 jmzhoulab

Switching to an another dataset(twitter-2010) was successful.

The stdout log

2022-08-11 07:30:27,843 [INFO] topoanalysis:igraph/page_rank.py:18 - begin load graph...
2022-08-11 07:55:54,097 [INFO] topoanalysis:igraph/page_rank.py:22 - vertex count 41652230, edge count 1468365182
2022-08-11 07:55:54,098 [INFO] topoanalysis:igraph/page_rank.py:23 - finied load graph, cost: 1526.26
2022-08-11 07:55:54,098 [INFO] topoanalysis:igraph/page_rank.py:24 - build graph cost: 1526.26
2022-08-11 07:55:54,098 [INFO] topoanalysis:igraph/page_rank.py:26 - ====== start pagerank =======
2022-08-11 08:40:09,094 [INFO] topoanalysis:igraph/page_rank.py:31 - res count: 41652230
2022-08-11 08:40:09,094 [INFO] topoanalysis:igraph/page_rank.py:33 - [7.475319044789155e-09, 4.237505003125664e-09, 7.1960692862092996e-09, 7.1960692862092996e-09, 7.196069286209607e-09, 7.196069286209607e-09, 3.828410197182684e-08, 2.4621186119459063e-07, 1.0186054720454494e-07, 1.0186054720454402e-07]
2022-08-11 08:40:09,094 [INFO] topoanalysis:igraph/page_rank.py:35 - algorithm cost: 2655.0

jmzhoulab avatar Aug 11 '22 09:08 jmzhoulab

The vertex count and edge count of twitter-2010 are fewer than 2^{31}

jmzhoulab avatar Aug 11 '22 09:08 jmzhoulab

I rebuild the python-igraph development version using igraph-4d2a03c30d303f6e6b9a64c7bf0c334b856c3c8c. This igraph version is work?

Yes, that should work.

I'm thinking about two possibilities but I don't know yet whether any of these is relevant or not:

  1. ARPACK may or may not support matrices with that many edges; I find the vendored ARPACK in igraph suspicious because it uses integer as the type all along the f2c-translated code, and integer is typedef'd to an int. So, if ARPACK uses an integer anywhere to represent edge indices (although I'm not sure whether it really does - it is not supposed to work with the edges directly), that might be problematic.
  2. Python might have problems allocating a list of length equal to the number of vertices, so the error might be not in igraph's calculation but at the point where we translate igraph's result back into a Python list. Since we are not using NumPy matrices, the translation is probably hard enough for Python because it needs to allocate one huge list and one additional float object for each list element (instead of just returning a NumPy array, which does not allocate float objects until they are actually retrieved from the list).

I find 1 more likely than 2, but I need to investigate further.

@jmzhoulab Any chance to compile python-igraph in debug mode and then run the PageRank calculation through gdb or lldb to get a stack trace of the crash? Simply replace the two occurrences of Release (note the capital R) with Debug in setup.py, remove the build and dist folders, and rebuild with python setup.py install. Then attach a debugger to the Python process before it starts the PageRank calculation and get a stack trace of the crash.

ntamas avatar Aug 11 '22 18:08 ntamas

I don't think (1) is possible because ARPACK never becomes aware of edges. We simply pass ARPACK a function that takes a vector and computes the action of the PageRank matrix on that vector. ARPACK does not need to represent that matrix—it just needs a function to compute its action.

Am I missing something @ntamas ?

szhorvat avatar Aug 11 '22 19:08 szhorvat

That's true, but I cannot think of any other reason; that's why it would be good to see a stack trace.

ntamas avatar Aug 11 '22 19:08 ntamas

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Oct 16 '22 09:10 stale[bot]

I have not been able to test this as I simply don't have access to a machine that has enough memory.

If you can build and test the latest version with AddressSanitizer that would be very immensely useful @jmzhoulab. Instructions for building and running python-igraph with AddressSanitizer are here: https://github.com/igraph/igraph/wiki/Using-sanitizers-to-find-bugs#python-igraph

Alternatively, if you're willing to test a piece of C code instead, that would be useful as well. Let me know and I can help port your Python test program to C and provide instructions for compiling/testing it. This might actually be easier than dealing with Python.

szhorvat avatar Oct 16 '22 09:10 szhorvat

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

stale[bot] avatar Mar 25 '23 08:03 stale[bot]