Jeff McAffer

Results 60 comments of Jeff McAffer

Yes. I have since commented out that code but suspect it is actually the -1 that is the problem as I was actually getting a SQL error with the update()

Same here. This is preventing temporarily unzip'd folders from being deleted (they are not empty) because the files in them are left open. fs.unlink marks them for deletion but does...

Just stumbled upon these issues (#49 and #50). In [ClearlyDefined](https://clearlydefined.io) we are using SPDX identifiers like crazy (that's literally all we support) and have found there are four distinct cases:...

`OR NOASSERTION`, at the very least comes up through typos and the like. For example, 'GPL-3.0 OR MTI`. Here the MTI should have been MIT but was mistyped. If we...

Great @markwalkom. I should not be that bad. Basically the store API as about 5-10 methods like `upsert`, `get`, `list`, ... all basic point or list queries. What are you...

The user processing in GHCrawler could be enhanced to follow the events for the user via https://developer.github.com/v3/activity/events/#list-public-events-performed-by-a-user but in normal circumstances this would only be triggered (at best) when GHCrawler...

Hey @danisyellis, the basic point here is that the crawler is configurable with `providers`. There are providers for queuing, storage, ... We have providers for many different queuing technologies (Rabbit,...

You can using the [visitor map](https://github.com/Microsoft/ghcrawler/wiki/Advanced-traversal-policies#visitor-map) concept. Essentially you create an object that has every node and edge you want to traverse. Take a look at the end of [visitorMap.js](https://github.com/Microsoft/ghcrawler/blob/develop/lib/visitorMap.js)....

Phew, I'm going to have to dig into the code on this one. The feature (being able to spec maps) is there but has seen relatively little use. I'm pretty...

Awesome. In the GitHub processor [there is a function that handles repos](https://github.com/Microsoft/ghcrawler/blob/develop/providers/fetcher/githubProcessor.js#L160). The input there is a `request` that has a `document`. The bulk of that function teases apart the...