PhishIntention
PhishIntention copied to clipboard
Question regarding the 9010 set
Sorry to bother you. I currently try to train a new CRP classifier with text input, and I would like to check whether it is possible to use part of your training samples. Can I ask whether there is any HTML for each of the 9010 samples used for your CRP classifier?
If possible, I also would like to check how the 9010 samples are taken from the Phishpedia dataset. I tried to use domain name, such as 12tv
here, to match each sample page to a page in the original sets phish_sample_30k and benign_sample_30k. However, it seems there is no exact domain match for most of the CRP samples.
Among those sample pages that have a domain match in the original Phishpedia dataset, the screenshots between the CRP sample and the original sample are different. An example is with the domain name 360converter
. Its screenshot in the 9010 set indicates that the sample is not a CRP.
However, its screenshot in the original set benign_sample_30k shows that the sample is a CRP.
Can I ask how to match the 9010 samples to the original samples in the Phishpedia evaluation sets in this case? I look forward to receiving your reply.