capa
capa copied to clipboard
Dynamic call scope: add a feature for extracting repeated api calls
CAPE provides the number of times an api call was repeated. This might be useful for rule authors (detecting API hammering, profiling malware fa)
I propose either adding a repeated
feature, or adding a characteristic("api-hammering")
feature.
of the two I feel like the first would be better, since if we opt for a characteristic feature we'd need to define the minimal bound for the repeats, which I feel like would be variant across samples/families (not sure about this however).
Example rule:
rule:
meta:
name: API hammering
scopes:
dynamic: thread
examples:
- 83e921d6368baade50bb2a1031efd6bc # Nymaim
features:
- call:
- api: EnumDisplaySettingsA
- repeated: 500 or more
Could we do this via count
?
like @mr-tz said we do support count(api(EnumDisplaySettingsA)): 3 or more
today; however, this relies on there being unique addresses for each of the things being counted. in the event data that @yelhamer mentions, a single event corresponds to many API calls. we don't have a way to tag the count directly on the API call today.
i'm leaning towards introducing the characteristic, for a few reasons:
- we can provide a reasonable default value, since rule authors probably don't need to vary this number from rule to rule
- its easy to add a new characteristic, much harder to add a new syntax
i think the following should be doable:
- call:
- api: EnumDisplaySettingsA
- characteristic: api-hammering
@mr-tz The only way I could think of that could allow us to use count
here is if we make it possible for characteristic to have a count attribute or something similar, and then add extra logic to the evaluate()
method of the Range
Statement to check first check if its contained feature has that count
attribute. Not sure if that's worthwhile though, but maybe if similar situations arise in the future then we can maybe add this?
For the time being I am learning towards @williballenthin's recommendation, but what threshold value should we use? should I research this?
should I research this?
that would be great! maybe parse through all the Avast dataset and see the common repeated values?
Alright then. I'm on it...
I've selected a sample of 2000~ cape reports and then extracted only calls that have been repeated at least 100 times. The most relatively-rare repeat counts are 300 and above:
As an initial guess, 300 or 400 seems like a good threshold. But I think I should do some more research into the samples that had that number of calls.
Here's the entire data in case needed
(repeat_count, unique_calls)
(0, 17699960)
(1, 603324)
(2, 67327)
(3, 57338)
(4, 14392)
(5, 8204)
(6, 4796)
(7, 10588)
(8, 3995)
(9, 2570)
(10, 968)
(11, 2063)
(12, 1869)
(13, 1201)
(14, 570)
(15, 730)
(16, 2196)
(17, 649)
(18, 682)
(19, 865)
(20, 514)
(21, 411)
(22, 414)
(23, 346)
(24, 366)
(25, 395)
(26, 309)
(27, 303)
(28, 297)
(29, 656)
(30, 298)
(31, 353)
(32, 522)
(33, 543)
(34, 584)
(35, 364)
(36, 252)
(37, 282)
(38, 283)
(39, 280)
(40, 212)
(41, 348)
(42, 279)
(43, 229)
(44, 202)
(45, 178)
(46, 212)
(47, 189)
(48, 253)
(49, 203)
(50, 298)
(51, 272)
(52, 206)
(53, 268)
(54, 208)
(55, 200)
(56, 192)
(57, 197)
(58, 217)
(59, 213)
(60, 193)
(61, 222)
(62, 383)
(63, 546)
(64, 228)
(65, 40)
(66, 4)
(67, 11)
(68, 6)
(69, 5)
(70, 5)
(71, 16)
(72, 14)
(73, 9)
(74, 11)
(75, 24)
(76, 6)
(77, 10)
(78, 5)
(79, 11)
(80, 7)
(81, 15)
(82, 7)
(83, 10)
(84, 9)
(85, 20)
(86, 6)
(87, 15)
(88, 5)
(89, 17)
(90, 3)
(91, 19)
(92, 1)
(93, 36)
(94, 2)
(95, 124)
(96, 140)
(97, 299)
(98, 156)
(99, 74)
(100, 3)
(101, 6)
(102, 5)
(103, 4)
(104, 3)
(105, 4)
(106, 6)
(107, 4)
(108, 5)
(110, 5)
(111, 5)
(112, 4)
(113, 9)
(114, 1)
(115, 3)
(116, 2)
(117, 2)
(118, 1)
(119, 5)
(120, 2)
(121, 6)
(122, 2)
(123, 1)
(125, 9)
(126, 8)
(127, 4)
(128, 9)
(129, 7)
(131, 2)
(132, 2)
(133, 2)
(134, 2)
(135, 1)
(137, 2)
(138, 4)
(140, 1)
(141, 3)
(142, 1)
(143, 10)
(144, 4)
(145, 2)
(146, 2)
(147, 4)
(148, 1)
(149, 3)
(150, 2)
(151, 29)
(153, 2)
(155, 18)
(157, 4)
(159, 5)
(163, 3)
(164, 2)
(166, 1)
(167, 1)
(168, 4)
(169, 2)
(172, 2)
(174, 3)
(175, 4)
(177, 5)
(178, 1)
(179, 14)
(180, 16)
(181, 22)
(182, 124)
(183, 1)
(184, 2)
(185, 8)
(186, 3)
(187, 6)
(188, 1)
(189, 2)
(190, 3)
(191, 1)
(192, 5)
(193, 2)
(194, 2)
(195, 6)
(196, 7)
(197, 2)
(198, 6)
(199, 5)
(200, 1)
(201, 3)
(203, 2)
(204, 4)
(206, 2)
(207, 2)
(208, 1)
(209, 3)
(211, 1)
(212, 4)
(216, 2)
(217, 1)
(220, 1)
(221, 1)
(227, 3)
(228, 2)
(229, 2)
(230, 2)
(232, 1)
(233, 2)
(234, 9)
(235, 3)
(236, 2)
(238, 2)
(240, 7)
(241, 1)
(242, 1)
(243, 2)
(244, 4)
(245, 6)
(246, 3)
(247, 2)
(248, 4)
(249, 1)
(250, 2)
(251, 1)
(252, 3)
(253, 4)
(254, 10)
(255, 26)
(256, 75)
(257, 28)
(259, 2)
(260, 1)
(263, 1)
(264, 2)
(265, 3)
(266, 4)
(267, 3)
(272, 4)
(275, 3)
(278, 2)
(283, 1)
(284, 1)
(286, 1)
(289, 3)
(292, 1)
(293, 1)
(294, 1)
(297, 1)
(299, 13)
(300, 1)
(304, 1)
(308, 1)
(312, 1)
(315, 1)
(316, 1)
(318, 1)
(319, 1)
(320, 2)
(323, 1)
(324, 2)
(325, 2)
(327, 2)
(328, 1)
(329, 1)
(330, 1)
(331, 2)
(332, 1)
(336, 1)
(338, 1)
(339, 1)
(341, 1)
(342, 1)
(346, 1)
(349, 3)
(350, 1)
(352, 2)
(353, 1)
(356, 1)
(357, 1)
(361, 2)
(362, 1)
(364, 1)
(366, 1)
(367, 1)
(368, 1)
(369, 1)
(370, 1)
(374, 1)
(375, 1)
(379, 2)
(384, 1)
(385, 2)
(389, 3)
(390, 1)
(391, 1)
(392, 1)
(393, 1)
(395, 1)
(396, 2)
(398, 1)
(401, 1)
(403, 1)
(407, 1)
(410, 1)
(411, 1)
(412, 1)
(413, 3)
(418, 1)
(421, 2)
(426, 1)
(428, 3)
(431, 1)
(441, 1)
(442, 2)
(444, 2)
(448, 2)
(454, 1)
(456, 2)
(462, 1)
(467, 2)
(469, 1)
(471, 1)
(472, 5)
(476, 1)
(477, 1)
(488, 1)
(491, 1)
(493, 1)
(498, 1)
(500, 2)
(502, 1)
(503, 3)
(504, 7)
(505, 4)
(506, 3)
(507, 3)
(508, 1)
(509, 1)
(511, 2)
(514, 2)
(515, 2)
(525, 1)
(539, 1)
(540, 2)
(543, 1)
(544, 2)
(546, 1)
(552, 1)
(553, 1)
(554, 2)
(557, 2)
(561, 1)
(563, 1)
(571, 1)
(575, 2)
(577, 1)
(578, 1)
(579, 3)
(583, 1)
(584, 1)
(590, 1)
(594, 1)
(601, 1)
(603, 1)
(610, 1)
(612, 1)
(618, 1)
(636, 2)
(641, 2)
(642, 1)
(644, 2)
(645, 1)
(650, 1)
(654, 1)
(657, 1)
(666, 1)
(671, 1)
(679, 1)
(717, 1)
(718, 2)
(721, 2)
(746, 1)
(747, 1)
(750, 1)
(752, 1)
(758, 1)
(761, 1)
(772, 1)
(781, 1)
(787, 1)
(816, 1)
(822, 1)
(835, 1)
(850, 1)
(851, 1)
(858, 1)
(868, 1)
(873, 1)
(915, 1)
(920, 1)
(929, 1)
(946, 1)
(972, 1)
(981, 1)
(991, 1)
(1003, 3)
(1004, 1)
(1006, 1)
(1007, 1)
(1008, 2)
(1009, 1)
(1010, 2)
(1011, 1)
(1012, 2)
(1032, 1)
(1051, 1)
(1066, 1)
(1069, 1)
(1080, 1)
(1086, 1)
(1089, 1)
(1106, 1)
(1109, 1)
(1171, 1)
(1172, 1)
(1271, 1)
(1344, 2)
(1345, 1)
(1367, 1)
(1439, 1)
(1440, 1)
(1454, 1)
(1468, 1)
(1511, 1)
(1512, 1)
(1521, 1)
(1559, 1)
(1572, 1)
(1590, 1)
(1593, 1)
(1603, 1)
(1648, 1)
(1675, 1)
(1806, 1)
(1962, 1)
(2015, 1)
(2024, 5)
(2096, 1)
(2122, 1)
(2483, 1)
(2501, 1)
(2672, 2)
(2786, 1)
(3138, 1)
(4059, 1)
(4295, 1)
(4710, 1)
(5046, 1)
(5700, 1)
(5735, 1)
(6675, 1)
(6809, 1)
(7113, 1)
(7448, 1)
(7451, 1)
(7995, 1)
(7997, 1)
(8984, 1)
(10391, 1)
(10585, 1)
(10775, 1)
(13097, 1)
(14394, 1)
(14589, 1)
(23997, 1)
(28601, 1)
(55521, 1)
(57121, 1)
(65534, 1)
(65535, 1)
neat plot! (and great use of < details >, i just learned about that myself.)
what do you think the data conveys and how does it inform our choice of the limit?
- samples above the 300-400 mark are pretty rare. ~17 million of the calls captured were not repeated, and 700,000 were repeated twice. While calls above the only some 180 calls were repeated more than 400 times. This suggests that we're likely to false flag this characteristic with that threshold.
- once we go above the 300 mark, the number of repeats as well as distinct calls making those repeats gets lesser and more sparse, which leads me to think that repeats in that range were likely chosen arbitrarily as opposed to being the result of some desired software behavior. I am still assuming that the behavior here is api-hammring, so perhaps I should look at the api's being repeated.
I do feel shaky about this threshold however, so maybe I should take a more in depth look at samples making that number of repeated calls.