scraper
scraper copied to clipboard
Error when attempting to access private repo
Attempting to run scraper
on a GitHub org with private repos results in an error.
Output:
% scraper --config config.json
2019-04-23 17:29:12,536 - INFO: Connected to: https://github.com
2019-04-23 17:29:12,773 - INFO: Processing: GSA/private-test
Traceback (most recent call last):
File "/home/jf/.pyenv/versions/3.7.0/bin/scraper", line 11, in <module>
load_entry_point('llnl-scraper', 'console_scripts', 'scraper')()
File "/home/jf/gsa/scraper/scraper/gen_code_gov_json.py", line 76, in main
code_json = code_gov.process_config(config_json)
File "/home/jf/gsa/scraper/scraper/code_gov/__init__.py", line 58, in process_config
code_gov_project = Project.from_github3(repo, labor_hours=compute_labor_hours)
File "/home/jf/gsa/scraper/scraper/code_gov/models.py", line 217, in from_github3
elif date_parse(repository.created_at) < POLICY_START_DATE:
File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 1356, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 645, in parse
res, skipped_tokens = self._parse(timestr, **kwargs)
File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 721, in _parse
l = _timelex.split(timestr) # Splits the timestr into tokens
File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 207, in split
return list(cls(s))
File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 76, in __init__
'{itype}'.format(itype=instream.__class__.__name__))
TypeError: Parser must be a string or character stream, not datetime
Here is a simplified config.json as a test case. The GSA/private-test repo is private and contains a README.md file.
{
"agency": "GSA",
"contact_email": "[email protected]",
"GitHub": [
{
"public_only": false,
"repos": [
"GSA/private-test"
]
}
]
}
Example of a real config.json where we encountered the issue. It scans properly until it arrives at a private repo, at which point it crashes.
{
"agency": "GSA",
"contact_email": "[email protected]",
"GitHub": [
{
"public_only": false,
"orgs": [
"GSA",
"18F",
"presidential-innovation-fellows",
"USWDS"
],
}
]
}
Verified that my GitHub access token is valid and can view private repos by using the same token for a different script.
Interesting... Can you post the output of pip list
? Specifically, I'm looking for version on github3.py
Here's pip list
:
Package Version Location
----------------- ---------- --------------------
asn1crypto 0.24.0
certifi 2018.8.24
cffi 1.12.3
chardet 3.0.4
cryptography 2.6.1
decorator 4.3.0
github3.py 1.2.0
idna 2.7
isodate 0.6.0
jwcrypto 0.6.0
llnl-scraper 0.8.0.dev0 /home/jf/src/scraper
mock 2.0.0
msrest 0.6.6
oauthlib 3.0.1
pbr 4.2.0
pip 19.0.3
pycparser 2.19
python-dateutil 2.7.3
python-gitlab 1.6.0
requests 2.19.1
requests-oauthlib 1.2.0
setuptools 39.0.1
six 1.11.0
stashy 0.5
uritemplate 3.0.0
uritemplate.py 3.0.2
urllib3 1.23
virtualenv 16.1.0
vsts 0.1.25
Huh, now I'm super confused. I nuked my pyenv and started fresh. Now the repository.created_at
property is a string and PR #32 no longer works for me.
I had this debugging output when I was working on the change: print("repository.created_at type: ", type(repository.created_at))
It previously output datetime
and now it's str
.
Here's my latest pip list
:
Package Version Location
----------------- ---------- ---------------------
asn1crypto 0.24.0
astroid 2.2.5
certifi 2019.3.9
cffi 1.12.3
chardet 3.0.4
cryptography 2.6.1
decorator 4.4.0
github3.py 1.2.0
idna 2.8
isodate 0.6.0
isort 4.3.17
jwcrypto 0.6.0
lazy-object-proxy 1.3.1
llnl-scraper 0.8.0.dev0 /Users/jf/gsa/scraper
mccabe 0.6.1
mock 2.0.0
msrest 0.6.6
oauthlib 3.0.1
pbr 5.2.0
pip 19.1
pycparser 2.19
pylint 2.3.1
python-dateutil 2.8.0
python-gitlab 1.8.0
requests 2.21.0
requests-oauthlib 1.2.0
setuptools 40.8.0
six 1.12.0
stashy 0.6
typed-ast 1.3.5
uritemplate 3.0.0
urllib3 1.24.2
vsts 0.1.25
wrapt 1.11.1
Possibly user error due to a bad environment? No idea. I'm going to see if I can replicate it and if not, maybe we can close this.
Thanks for the additional information @jfredrickson5 .
FWIW, you're not crazy... I've seen very similar behavior. I think there is a package in the dependency chain that is changing it's behavior... I've thought about trying to add some exception handling there to "do the right thing" but haven't gotten that all the way yet. If you're interested in adding that, I'd welcome the addition!
@jfredrickson5 - I see you closed the MR, are you thinking that this is resolved too? Or did we still need to fix something?
@IanLee1521 I'm not sure what notification GitHub sent you, but I'm in the process of merging my separate personal and work GitHub accounts into one, so I think that must have unintentionally triggered something; I haven't actually made changes to this issue.
It was the the MR that got closed: https://github.com/LLNL/scraper/pull/32
but ah, I see it was auto-closed by deleting a reference:
Ah, that was my personal fork that disappeared then. It's been a while so I don't know if the change is still valid, but feel free to grab the change and use it.