django-DefectDojo
django-DefectDojo copied to clipboard
#6620 Create API importer for Bugcrowd
Implements https://github.com/DefectDojo/django-DefectDojo/issues/6620
Needs testing too
I have added a regex parsing for endpoints because I kept getting errors I have mentioned in https://github.com/DefectDojo/django-DefectDojo/issues/6580 ! Submissions can have random stuff in their bug_url, not just urls which is annoying...
@Gby56 This is a great start here! There a couple things left to do before we can approve
- Fix flake 8 errors
- Add some unit tests. This would be good starting place
@Maffooch all good :) I copied tests from Edgescan
@Gby56
- when I said use urlencode method instead, I was refering to something like this:
import urllib.request
import urllib.parse
params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
url = "http://www.musi-cal.com/cgi-bin/query?%s" % params
with urllib.request.urlopen(url) as f:
print(f.read().decode('utf-8'))
Which is easier to read and less bug prone (instead of pushing the string encoded directly in the code).
- We should convert this method
get_findings(self, program, target)
to be a generator function. If I understand well the data returned by GitLab API, the data are paginated. Which means that the function could return multiple time data. Perfect use case for a generator function.
Done ! The bugcrowd api uses an annoying query format for JSONAPI, with brackets everywhere, this definitely looks better now
Implemented a generator pattern for the api client function, no gain in performance since it stays sequential, removes the "accumulator" logic from the function that's all A better pattern would be to build out the work pool initially, looking at a given total number of elements if it's returned by the API, and then split the work across workers. Otherwise, when the total number of pages is unknown, it would be possible to split between 2 workers (or more) by just taking odd/even page numbers for example ?
from concurrent import futures
maxWorker = min(10,len(total_amount_of_pages)) ## how many thread you want to deal in parallel. Here 10 maximum, or the amount of pages requested.
urls = ['url'*n for n in total_amount_of_pages] ## here I create an iterable that the function will consume.
with futures.ThreadPoolExecutor(workers) as executor:
res = executor.map(requests.get,urls) ## it returns a generator
## it is consuming the function in the first argument and the iterable in the 2nd arguments, you can send more than 1 argument by adding new ones (as iterable).
myresult = list(res)
Code is functional, could be improved a bit with URI parsing and handling more use cases (bug urls in Bugcrowd are not always URL/URIs, sometimes just hosts, sometimes just paths without the host which is almost unusable) I sent an email to Bugcrowd support about their poor input validation on that field, as it's not very useful if it has noise.
@damiencarol @Maffooch @kiblik if I can get a last review, it should be all good 🤔
Not sure why one test failed, is that normal ?
This pull request has conflicts, please resolve those before we can evaluate the pull request.
Conflicts have been resolved. A maintainer will review the pull request shortly.
@Gby56, sorry for the delay. I wrote my comments. I'm a little bit confused that some of them haven't been identified by Flake8
@kiblik We'd like to see this one merged for the next release if possible - are there outstanding issues?
Ouuuu, sorry @coheigea & @Gby56. 6 days ago, I wrote my comments but I forgot to click "Submit Review" so they haven't been visible to everybody and they stayed in "Pending" status.
@kiblik Hello ! I have implemented all the suggestions, and improved the endpoint issue. I check before saving them that they are not broken, and completely skip if they are. The reason behind that is because of Bugcrowd's API/web ui, lots of researchers submit weird stuff in that field and I'd prefer to have an actual clean endpoint, than 20 broken ones.
This pull request has conflicts, please resolve those before we can evaluate the pull request.
Conflicts have been resolved. A maintainer will review the pull request shortly.
@kiblik yeah I did the upstream fetch on my fork and it pulled the master, but now that I merged dev into my branch it's not working either, anyways I copied over the exact files from dev and there are no conflicts here, it's weird how the diff is showing
I think it’s fine though ? It’s the same as the dev branch anyways
On Tue 13 Sep 2022 at 21:55, kiblik @.***> wrote:
@.**** commented on this pull request.
In components/package.json https://github.com/DefectDojo/django-DefectDojo/pull/6621#discussion_r970029742 :
@@ -1,6 +1,6 @@ { "name": "defectdojo",
- "version": "2.14.0-dev",
- "version": "2.15.0-dev",
Unfortunately, I'm not able to help you here. I have no idea. But I would try git rebase dev
— Reply to this email directly, view it on GitHub https://github.com/DefectDojo/django-DefectDojo/pull/6621#discussion_r970029742, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABTFKKCFPAAQSSBSEMMXMITV6DL4VANCNFSM545DRKNA . You are receiving this because you were mentioned.Message ID: @.***>
This pull request has conflicts, please resolve those before we can evaluate the pull request.
Conflicts have been resolved. A maintainer will review the pull request shortly.
the init file probably didn't get linted with black, the string delimiters were just different
@kiblik tell me if everything is ok for you, I have added the feedback from the http responses in case of a failure as requested
Yes @Gby56, it looks good to me. Thank you for the changes and your effort. Unfortunately, I do not have "approval" or "merge" rights here, so you will need to wait or ask somebody else.