cloud-pipeline
cloud-pipeline copied to clipboard
[GCP Logging] Use Cloud logging for audit logs
Background At the moment all user actions are logged to security logs, which are stored in Elasticsearch. To improve GCP integration we need to allow storing logs using Cloud Logging
There are two primary ways to integrate GCP Cloud Logging (Cloud Logging Overview):
Using Cloud Logging Libraries within the Application – This approach requires modifying the application code to send logs directly to Cloud Logging.
Using the Ops Agent – This method does not require any code changes and allows logs to be sent to Cloud Logging as well as other logging services.
Installing the Ops Agent: Multiple installation methods are available (Installation Guide). Authorizing the Ops Agent: Ensure proper permissions are set (Authorization Guide). Configuring the Ops Agent: Customize log collection settings (Configuration Guide).
Injecting Fields into Logs Sent to GCP Cloud Logging
Ops Agent allows logs to be modified before sending it, by adding custom fields (e.g., a marker to identify the log source). This is achieved using the Ops Agent, which supports processors to transform logs and receivers to monitor log files.
Processors: These enable log transformation, such as parsing JSON and injecting fields. The parse_json processor extracts structured data, while modify_fields adds custom fields. See Ops Agent Configuration - Logging Processors.
Receivers: Configurable to monitor log files, including wildcards for dynamic file matching (e.g., security*.json). See Ops Agent Configuration - Logging Receivers.
Configuration example:
logging: receivers: security_log: type: files include_paths: - /home/logs/security*.json # Wildcard to capture multiple files record_log_file_path: true # Logs file path as metadata processors: parse_json_security: type: parse_json time_key: timestamp time_format: "%Y-%m-%dT%H:%M:%S.%L%z" # Matches "2025-02-25T01:35:37.975+0000" add_service_name: type: modify_fields fields: servicename: # Adds field at top level (not jsonPayload.servicename due to processor limits) static_value: "pipeline-api" # Custom marker for source service: pipelines: default_pipeline: receivers: [other_logs] # Other log sources security_pipeline: receivers: [security_log] processors: - parse_json_security # Parse JSON logs - add_service_name # Inject servicename
Accessing Logs in GCP
GCP provides multiple methods to access logs stored in Cloud Logging:
Logs Explorer: Description: A console-based interface for querying logs using the Logging Query Language (LQL). Programmatic Access: Available via the Cloud Logging API or SDKs (e.g., google-cloud-logging in Java). Use Case: Real-time log inspection and filtering.
Log Analytics: Description: An integrated feature in Cloud Logging that supports SQL-like queries (GoogleSQL) on logs in analytics-enabled buckets. Enable it via Log Analytics Setup. Limitation: As of April 2025, no public programmatic API exists; queries are console-only. Use Case: Ad-hoc SQL-based log analysis.
BigQuery: Description: A managed data warehouse for advanced analytics.: Programmatic Access: Supported via the BigQuery API and SDKs (e.g., google-cloud-bigquery in Java). Use Case: Complex queries, aggregations, and long-term storage.
Options to Migrate from Elasticsearch to GCP Cloud Logging
Using Native GCP Logging API: Approach: Fetch logs programmatically using the Cloud Logging API with LQL. Pros: Real-time access, native. Cons: LQL lacks aggregation (e.g., GROUP BY), requiring client-side processing.
Filter Example:
public List<String> retrieveLogs(String filter, int limit) { List<String> logEntries = new ArrayList<>(); Logging.EntryListOption[] options = {Logging.EntryListOption.filter(filter)}; var entries = logging.listLogEntries(options); for (LogEntry entry : entries.iterateAll()) { String logLine = String.format("JSON Log [%s]: %s", entry.getTimestamp(), entry.getPayload().getData()); logEntries.add(logLine); if (logEntries.size() >= limit) break; } return logEntries; }
Using BigQuery:
Approach: Export logs to BigQuery via a sink and query using GoogleSQL, which supports aggregation. Pros: Rich SQL functionality, joins, and scalability.
Code Example (Java):
public List<String> getUniqueServiceNames() { String query = "SELECT DISTINCT JSON_VALUE(json_payload, '$.servicename') AS ss " + "FROM
<big_query_bucket_name>._AllLogs" + "WHERE JSON_VALUE(json_payload, '$.servicename') IS NOT NULL"; QueryJobConfiguration queryConfig = QueryJobConfiguration.newBuilder(query).build(); TableResult result = bigQuery.query(queryConfig); List<String> serviceNames = new ArrayList<>(); result.iterateAll().forEach(row -> serviceNames.add(row.get("ss").getStringValue())); return serviceNames; }
Using Log Analytics (Console-Only):
Approach: Enable Log Analytics on a bucket and query via the GCP Console with GoogleSQL. Pros: SQL-like queries; no additional storage cost. Cons: No programmatic access as of April 2025; limited to console use.
Query Example:
SELECT DISTINCT JSON_VALUE(json_payload, '$.servicename') AS ss FROM
<project_id>.<region>.<log_bucket>._AllLogsWHERE JSON_VALUE(json_payload, '$.servicename') IS NOT NULL
High-Level Migration Steps
-
Configure Ops Agent: Install on servers, set up receivers for log files, and add processors for custom fields.
-
Choose Access Method: Use Logs Explorer/API for real-time monitoring; export to BigQuery for complex queries. Set Up BigQuery (if needed): Create a sink, define a dataset, and validate log export.
-
Update Application Code: Replace Elasticsearch queries with Cloud Logging/BigQuery API calls.
@kbashpayev Let's go with the following approach:
- send logs to GCP Cloud Logging using Ops Agent
- set up BigQuery via a sink
- allow to configure logging approach via application property (Elasticsearch vs GCP BigQuery)