Roman Dorosh
Roman Dorosh
Guys, do you need help to speed up parsing? I can step in and try to help you.
> Parsing is not needed as the data is in JSON (python dictionary) but accessing what we need is needed. Have you worked with [hyperjson](https://github.com/mre/hyperjson) or [orjson](https://github.com/ijl/orjson)? Actually, didn't have...
Also, what kind of trees you want to build from the json representations?
I can write the multiprocessing version of this, which can speed up matching, just attach full file with code
```python import orjson as json from collections.abc import Generator from io import TextIOWrapper from pathlib import Path import pandas as pd from zstandard import ZstdDecompressor, open as zopen import asyncio...
> But i don't believe it parses the internal comments of the code pieces inside the function itself. > > It only parses the docstring. Also they had the limitation,...
```python import aiohttp import asyncio from dotenv import load_dotenv import os import json from pathlib import Path load_dotenv() GITHUB_REPOS_FILENAME = 'github_repos_names.txt' GITHUB_ISSUES_FILENAME = 'github_issues.json' VISITED_GITHUB_REPOS_FILENAME = 'visited_github_repos.txt' API_LIMIT = 4000...
@GravermanDev So, that is basically it. The main problem is API_LIMIT, it only gives 5000 requests per registered user. Also, after processing repos, I've collected about 69k repos from the...
Created repository for this: https://github.com/doroshroman/github_issues I'll continue to collect other parts and dump all data not in github, because of git lfs limitation.
@ontocord [Here the raw dataset](https://drive.google.com/file/d/1NOAgf2bhwQYnnjjrpiWWV_5Adl9xwzUJ/view?usp=share_link). The format is the following: ```json [ { "issue_url": "https://api.github.com/repos/paulirish/speedline/issues/92", "issue_title": "Create issues ", "comments": [ { "url": "https://api.github.com/repos/paulirish/speedline/issues/comments/882629228", "html_url": "https://github.com/paulirish/speedline/issues/92#issuecomment-882629228", "issue_url": "https://api.github.com/repos/paulirish/speedline/issues/92", "id": 882629228,...