hawk
hawk copied to clipboard
Bug: Implement Logic to Handle Result Size and Item Limits in Scripts
What happened?
The Search-UnifiedAuditLog cmdlet has a maximum result size limit (e.g., 50,000 items). When queries exceed this limit, data retrieval is incomplete, leading to partial data collection. Scripts that retrieve large datasets, such as Search-HawkTenantEXOAuditLog.ps1 and Get-HawkUserMailboxAuditing.ps1, may hit this limit and fail to capture all relevant audit log entries. This limitation affects the accuracy and completeness of the data analysis provided by Hawk.
Steps to Reproduce
- Run the
Search-HawkTenantEXOAuditLog.ps1orGet-HawkUserMailboxAuditing.ps1script in an environment with extensive audit logs. - Observe that the script retrieves only a subset of the expected data.
- Note that the maximum result size of
Search-UnifiedAuditLoghas been reached. - Data beyond the maximum limit is not retrieved, leading to incomplete results.
Hawk Version
3.1.0
Technical Analysis
- The
Search-UnifiedAuditLogcmdlet has a maximum result size limit (e.g., 50,000 items) per query. - Scripts that request large date ranges or have high activity levels may exceed this limit.
- Without logic to handle this limitation, the scripts retrieve only the first set of results up to the maximum limit.
- Users are not warned about incomplete data retrieval, leading to potential inaccuracies in analysis.
- Manually adjusting query parameters is not user-friendly and may not be practical.
Implementation Plan
-
Identify Affected Scripts:
- Focus on scripts that may retrieve large datasets, specifically:
Search-HawkTenantEXOAuditLog.ps1Get-HawkUserMailboxAuditing.ps1- Any other scripts that utilize
Search-UnifiedAuditLogand can return large result sets.
- Focus on scripts that may retrieve large datasets, specifically:
-
Implement Time Interval Breakdown:
- Modify the scripts to break down large queries into smaller time intervals that ensure the result size stays below the maximum limit.
- Determine the optimal time interval (e.g., days, hours) based on the expected volume of data.
- Implement a loop that adjusts the
StartDateandEndDateparameters incrementally to cover the entire desired date range.
-
Use Pagination:
- Utilize pagination parameters such as
SessionCommandorNextPageif supported. - If
Search-UnifiedAuditLogsupports pagination, implement logic to retrieve all pages of results.
- Utilize pagination parameters such as
-
Implement Result Size Checks:
- After each query, check if the number of results is close to the maximum limit.
- If so, reduce the time interval and re-query to ensure no data is missed.
-
Optimize Performance:
- Avoid excessive API calls by calculating the appropriate time intervals based on previous query results.
- Implement asynchronous processing if possible to improve execution time.
-
Maintain Data Integrity:
- Ensure that data from multiple queries is combined without duplication.
- Handle overlapping time intervals carefully to avoid missing or duplicating records.
-
Update User Feedback:
- Provide progress updates to the user during execution.
- Inform users if large datasets are detected and how the script is handling them.
-
Update Unit Tests:
- Create tests that simulate large datasets to verify that the scripts handle result size limits correctly.
- Ensure that the scripts retrieve complete datasets without errors.
-
Documentation:
- Update script comments and documentation to explain how the scripts handle large datasets.
- Provide guidance to users on expected execution times for large data volumes.
Acceptance Criteria
- Scripts retrieve complete datasets without hitting result size limits.
- Data integrity is maintained across multiple query intervals.
- Scripts perform efficiently without significant delays.
- Users are not required to manually adjust query parameters.
- Unit tests are updated to reflect changes and pass successfully.
Additional Notes:
-
Testing:
- Test the scripts in environments with varying sizes of audit logs.
- Simulate scenarios where the result size limit would be exceeded.
- Measure execution time and optimize where possible.
-
Dependencies:
- Confirm that
Search-UnifiedAuditLogsupports any pagination or session parameters used. - Ensure compatibility with existing modules and versions.
- Confirm that