Scrapegraph-ai icon indicating copy to clipboard operation
Scrapegraph-ai copied to clipboard

Prallel Execution of Nodes

Open ChrisDelClea opened this issue 9 months ago • 5 comments

Hey Folks,

this looks like a great project. I am not sure if its a bug or a feature.

Describe the bug

Executes the node's logic to instantiate and run multiple graph instances in parallel. Stated in here

To Reproduce Can someone explain to me, where the parallelism kicks in? As far as i can tell, it runs in a for loop:

        # run the graph for each url and use tqdm for progress bar
        graphs_answers = []
        for graph in tqdm(graphs_instances, desc="Processing Graph Instances", disable=not self.verbose):
            result = graph.run()
            graphs_answers.append(result)

Expected behavior I would expect that the nodes are executed in parallel and joined back together for speedup .

ChrisDelClea avatar May 09 '24 18:05 ChrisDelClea

hey @ChrisDelClea Chris, yes it is a feature to implement! Right now it is iterating over the SmartScraper instances so it is sequential but we could implement an async/await mechanism on the run method of the graphs

PeriniM avatar May 10 '24 09:05 PeriniM

This would be great.

ChrisDelClea avatar May 10 '24 09:05 ChrisDelClea

@PeriniM I might take care of it as soon as I'm done running tests for #147.

I would use a semaphore with a param specifying the number of concurrent graphs, not to exhaust too many resources.

DiTo97 avatar May 10 '24 10:05 DiTo97

@DiTo97 That would be sweet!

PeriniM avatar May 10 '24 12:05 PeriniM

@PeriniM opened a PR; code should be working, but better if you test it first hand before merging

DiTo97 avatar May 10 '24 22:05 DiTo97

Closed with #213

PeriniM avatar May 13 '24 21:05 PeriniM