scraper icon indicating copy to clipboard operation
scraper copied to clipboard

Error when attempting to access private repo

Open jfredrickson5 opened this issue 5 years ago • 8 comments

Attempting to run scraper on a GitHub org with private repos results in an error.

Output:

% scraper --config config.json                     
2019-04-23 17:29:12,536 - INFO: Connected to: https://github.com                                     
2019-04-23 17:29:12,773 - INFO: Processing: GSA/private-test                                         
Traceback (most recent call last):
  File "/home/jf/.pyenv/versions/3.7.0/bin/scraper", line 11, in <module>                            
    load_entry_point('llnl-scraper', 'console_scripts', 'scraper')()                                 
  File "/home/jf/gsa/scraper/scraper/gen_code_gov_json.py", line 76, in main                         
    code_json = code_gov.process_config(config_json)                                                 
  File "/home/jf/gsa/scraper/scraper/code_gov/__init__.py", line 58, in process_config               
    code_gov_project = Project.from_github3(repo, labor_hours=compute_labor_hours)                   
  File "/home/jf/gsa/scraper/scraper/code_gov/models.py", line 217, in from_github3                  
    elif date_parse(repository.created_at) < POLICY_START_DATE:                                      
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 1356, in parse
    return DEFAULTPARSER.parse(timestr, **kwargs)
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 645, in parse
    res, skipped_tokens = self._parse(timestr, **kwargs)                                             
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 721, in _parse
    l = _timelex.split(timestr)         # Splits the timestr into tokens                             
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 207, in split
    return list(cls(s))
  File "/home/jf/.pyenv/versions/3.7.0/lib/python3.7/site-packages/dateutil/parser/_parser.py", line 76, in __init__
    '{itype}'.format(itype=instream.__class__.__name__))                                             
TypeError: Parser must be a string or character stream, not datetime

Here is a simplified config.json as a test case. The GSA/private-test repo is private and contains a README.md file.

{
  "agency": "GSA",
  "contact_email": "[email protected]",
  "GitHub": [
    {
      "public_only": false,
      "repos": [
        "GSA/private-test"
      ]
    }
  ]
}

Example of a real config.json where we encountered the issue. It scans properly until it arrives at a private repo, at which point it crashes.

{
  "agency": "GSA",
  "contact_email": "[email protected]",
  "GitHub": [
    {
      "public_only": false,
      "orgs": [
        "GSA",
        "18F",
        "presidential-innovation-fellows",
        "USWDS"
      ],
    }
  ]
}

Verified that my GitHub access token is valid and can view private repos by using the same token for a different script.

jfredrickson5 avatar Apr 23 '19 21:04 jfredrickson5

Interesting... Can you post the output of pip list ? Specifically, I'm looking for version on github3.py

IanLee1521 avatar Apr 24 '19 02:04 IanLee1521

Here's pip list:

Package           Version    Location                                                                 
----------------- ---------- --------------------                                                     
asn1crypto        0.24.0                                                                              
certifi           2018.8.24                                                                           
cffi              1.12.3                                                                              
chardet           3.0.4                                                                               
cryptography      2.6.1                                                                               
decorator         4.3.0                                                                               
github3.py        1.2.0                                                                               
idna              2.7                                                                                 
isodate           0.6.0                                                                               
jwcrypto          0.6.0                                                                               
llnl-scraper      0.8.0.dev0 /home/jf/src/scraper                                                     
mock              2.0.0                                                                               
msrest            0.6.6                                                                               
oauthlib          3.0.1                                                                               
pbr               4.2.0                                                                               
pip               19.0.3                                                                              
pycparser         2.19                                                                                
python-dateutil   2.7.3                                                                               
python-gitlab     1.6.0                                                                               
requests          2.19.1                                                                              
requests-oauthlib 1.2.0                                                                               
setuptools        39.0.1                                                                              
six               1.11.0                                                                              
stashy            0.5
uritemplate       3.0.0
uritemplate.py    3.0.2
urllib3           1.23
virtualenv        16.1.0
vsts              0.1.25

jfredrickson5 avatar Apr 24 '19 14:04 jfredrickson5

Huh, now I'm super confused. I nuked my pyenv and started fresh. Now the repository.created_at property is a string and PR #32 no longer works for me.

I had this debugging output when I was working on the change: print("repository.created_at type: ", type(repository.created_at))

It previously output datetime and now it's str.

Here's my latest pip list:

Package           Version    Location
----------------- ---------- ---------------------
asn1crypto        0.24.0
astroid           2.2.5
certifi           2019.3.9
cffi              1.12.3
chardet           3.0.4
cryptography      2.6.1
decorator         4.4.0
github3.py        1.2.0
idna              2.8
isodate           0.6.0
isort             4.3.17
jwcrypto          0.6.0
lazy-object-proxy 1.3.1
llnl-scraper      0.8.0.dev0 /Users/jf/gsa/scraper
mccabe            0.6.1
mock              2.0.0
msrest            0.6.6
oauthlib          3.0.1
pbr               5.2.0
pip               19.1
pycparser         2.19
pylint            2.3.1
python-dateutil   2.8.0
python-gitlab     1.8.0
requests          2.21.0
requests-oauthlib 1.2.0
setuptools        40.8.0
six               1.12.0
stashy            0.6
typed-ast         1.3.5
uritemplate       3.0.0
urllib3           1.24.2
vsts              0.1.25
wrapt             1.11.1

Possibly user error due to a bad environment? No idea. I'm going to see if I can replicate it and if not, maybe we can close this.

jfredrickson5 avatar May 03 '19 18:05 jfredrickson5

Thanks for the additional information @jfredrickson5 .

FWIW, you're not crazy... I've seen very similar behavior. I think there is a package in the dependency chain that is changing it's behavior... I've thought about trying to add some exception handling there to "do the right thing" but haven't gotten that all the way yet. If you're interested in adding that, I'd welcome the addition!

IanLee1521 avatar May 08 '19 16:05 IanLee1521

@jfredrickson5 - I see you closed the MR, are you thinking that this is resolved too? Or did we still need to fix something?

IanLee1521 avatar Mar 04 '24 18:03 IanLee1521

@IanLee1521 I'm not sure what notification GitHub sent you, but I'm in the process of merging my separate personal and work GitHub accounts into one, so I think that must have unintentionally triggered something; I haven't actually made changes to this issue.

jfredrickson5 avatar Mar 04 '24 19:03 jfredrickson5

It was the the MR that got closed: https://github.com/LLNL/scraper/pull/32

but ah, I see it was auto-closed by deleting a reference:

image

IanLee1521 avatar Mar 04 '24 19:03 IanLee1521

Ah, that was my personal fork that disappeared then. It's been a while so I don't know if the change is still valid, but feel free to grab the change and use it.

jfredrickson5 avatar Mar 04 '24 19:03 jfredrickson5