ingestly-endpoint
ingestly-endpoint copied to clipboard
Ingestly Endpoint for Real-Time Analytics powered by Fastly & Google BigQuery
Ingestly Endpoint
- Japanese Document available here / 日本語ドキュメントはこちら
- Column Reference is here
What's Ingestly?
Ingestly is a simple tool for ingesting beacons to Google BigQuery. Digital Marketers and Front-end Developers often want to measure user's activities on their service without limitations and/or sampling, in real-time, having ownership of data, within reasonable cost. There are huge variety of web analytics tools in the market but those tools are expensive, large footprint, less flexibility, fixed UI, and you will be forced to use SDKs including legacy technologies like document.write.
Ingestly is focusing on Data Ingestion from the front-end to Google BigQuery by leveraging Fastly's features. Also, Ingestly can be implemented seamlessly into your existing web site with in the same Fastly service, so you can own your analytics solution and ITP does not matter.
Ingestly provides:
- Completely server-less. Fastly and Google manages all of your infrastructure for Ingestly. No maintenance resource required.
- Near real-time data in Google BigQuery. You can get the latest data in less than seconds just after user's activity.
- Fastest response time for beacons. The endpoint is Fastly's global edge nodes, no backend, response is HTTP 204 and SDK uses ASYNC request.
- Direct ingestion into Google BigQuery. You don't need to configure any complicated integrations, no need to export/import by batches.
- Easy to start. You can start using Ingestly within 2 minutes for free if you already have a trial account on Fastly and GCP.
- WebKit's ITP friendly. The endpoint issues 1st party cookies with Secure and httpOnly flags.
Setup
You can use one of BigQuery and Elasticsearch, or both as a database for logging. Fastly support multiple log-streaming in the same configuration.
BigQuery support SQL and faster query speed with massive logs. Elasticsearch supports super flexible schema-less data structure.
If you are going to use custom data (*_attr variables) frequently, or you wish to utilize Kibana's great visualization features, Elasticearch is better choice.
If you will get huge records from the giant website, or you wish to use Data Studio, BigQuery gives you better performance within reasonable cost.
Prerequisites
- A Google Cloud Platform account, and a project used for Ingestly.
- A Fastly account, and a service used for Ingestly.
- This endpoint may use cookies named
ingestlyId,ingestlySesandingestlyConsentunder your specified domain name.
Note that a GCP project and a Fastly service can be created for Ingestly or you can use your existing one.
Google Cloud Platform
Create a service account for Fastly
- Go to the GCP console, then open
IAM & admin>service accounts. - Create a service account like
ingestlyand grant aBigQuery>BigQuery Data Ownerpermission. - Create a key and download it as JSON format.
- Open the JSON you just downloaded and note
private_keyandclient_email.
Create a table for the log data on BigQuery
- Go to the GCP console, then open
BigQuery. - Create a dataset like
Ingestlyif you haven't had. - Create a table with your preferred table name like
access_log, then enableEdit as textin Schema section. (note your table name) - Open
BigQuery/table_schemafile in this repository, copy the content and paste it to the schema text box of table creation modal. - In the
Partition and cluster settingssection, Selecttimestampcolumn for partitioning. - Specify
action,categoryto theClustering order (optional)field. - Finish creating the table.
Elasticsearch
Create a user for Fastly
- Open Kibana UI.
- Go to
Management > Security > Roles. - Click top-right
Create rolebutton. - Name this role as
Ingestly - Type
ingestly-#{%F}intoIndexfield manually. (an index name will be generated dynamically by strftime. in this case, an index is daily basis with YYYY-MM-DD formatted date.) - Select
create_index,create,index,read,writeandmonitorinPrivilegesfield, then save. - Go to
Management > Security > Users - Click top-right
Create userbutton. - Name this role as
Ingestlyand fill each field as you like. - Select
Ingestlyfrom a role list, then save.
Put a mapping template to Elasticsearch
- Go to
Dev Tools. - Type
PUT _template/ingestlyinto the first line of Dev Tools console. - Open
Elasticsearch/mapping_template.jsonfile and copy & paste the content to the second line of Dev Tools console. - Click the triangle icon on the first line (execute the command)
If you see Custom Analyzer related error message when you executed above process, you should choose one of the following selections.
A. Add Natural Language Analysis plugins to Elasticsearch. analysis-kuromoji and analysis-icu are recommended.
B. Remove analysis section (from line 22 to line 40) from Elasticsearch/mapping_template.json to deactivate Analyzer.
Create an index pattern
- Go to
Management > Kibana > Index Patterns. - Click top-right
Create index patternbutton. - Fill
ingestlyintoIndex Patternfield, then clickNext step. - Select
timestampfromTime Filter field namepulldown, then clickCreate index pattern.
Fastly
Dictionaries
- Open
DictionariesunderDatamenu in CONFIGURE page under your service. - Create a dictionary named
ingestly_apikeysby clickingCreate a dictionarybutton. - Add an item with
keyas2ee204330a7b2701a6bf413473fcc486,valueastruefromAdd itemlink foringestly_apikeys. - In the same way, create a dictionary named
ingestly_metadataby clickingCreate a dictionarybutton. - Add the following two items to the dictionary
ingestly_metadata.
| key | value | description |
|---|---|---|
| cookie_domain | example.com |
A domain name of Cookies set by the Endpoint. |
| cookie_lifetime | 31536000 |
A Cookie lifetime of Cookies set by the Endpoint. |
Custom VCL
- Open
Custom VCLin CONFIGURE page. - Click
Upload a VCL filebutton, then set preferred name likeIngestly, selectingestly.vcland upload the file.
Integrate with Google BigQuery
- Open
Loggingin CONFIGURE page. - Click
CREATE ENDPOINTbutton and selectGoogle BigQuery. - Open
attach a condition.link near highlightedCONDITIONand selectCREATE A NEW RESPONSE CONDITION. - Enter a name like
Data Ingestionand set(resp.status == 204 && req.url ~ "^/ingestly-ingest/(.*?)/\?.*" || resp.status == 200 && req.url ~ "^/ingestly-sync/(.*?)/\?.*")intoApply if…field. - Fill information into fields:
Name: anything you want.Log format: copy and paste the content ofBigQuery/log_formatfile in this repository.Email: a value fromclient_emailfield of GCP credential JSON file.Secret key: a value fromprivate_keyfield of GCP credential JSON file.Project ID: your project ID of GCP.Dataset: a dataset name you created for Ingestly. (e.g.Ingestly)Table: a table name you created for Ingestly. (e.g.logs)Template: this field can be empty but you can configure time-sliced tables if you enter like%Y%m%d.
- Click
CREATEto finish the setup process.
Integrate with Elasticsearch
- Open
Loggingin CONFIGURE page. - Click
CREATE ENDPOINTbutton and selectElasticsearch. - Open
attach a condition.link near highlightedCONDITIONand selectCREATE A NEW RESPONSE CONDITION. - Enter a name like
Data Ingestionand set(resp.status == 204 && req.url ~ "^/ingestly-ingest/(.*?)/\?.*" || resp.status == 200 && req.url ~ "^/ingestly-sync/(.*?)/\?.*")intoApply if…field. - Fill information into fields:
Name: anything you want.Log format: copy and paste the content ofElasticsearch/log_formatfile in this repository.URL: An endpoint URL of Elasticsearch cluster.Index: An index name for Elasticsearch. Setingestly.BasicAuth user: An username for Elasticsearch authentication. SetIngestly.BasicAuth password: Set a password for userIngestlyon Elasticsearch cluster.
- Click
CREATEto finish the setup process.
Integrate with Amazon S3
- Open
Loggingin CONFIGURE page. - Click
CREATE ENDPOINTbutton and selectAmazon S3. - Open
attach a condition.link near highlightedCONDITIONand selectCREATE A NEW RESPONSE CONDITION. - Enter a name like
Data Ingestionand set(resp.status == 204 && req.url ~ "^/ingestly-ingest/(.*?)/\?.*" || resp.status == 200 && req.url ~ "^/ingestly-sync/(.*?)/\?.*")intoApply if…field. - Fill information into fields:
Name: anything you want.Log format: copy and paste the content ofS3/log_formatfile in this repository. You can specify not only CSV but JSON format here ({ ... }form).Timestamp format: (not necessary)Bucket name: The name of the bucket in which to store the logs.Access key: An access key of the service account that can write into the bucket above.Secret key: An secret key of the service account that can write into the bucket above.Period: Log rotation interval(seconds). e.g. 600 means 10 minutes.- Advanced options
Path: The path within the bucket for placing files. You may specify dynamic variables in strftime format. In order to use Athena's partitioning feature by date, the path name must include/date=%Y-%m-%d/format.Domain: The endpoint domain of your S3 bucket region (outside of US Standard region). e.g. Tokyo iss3.ap-northeast-1.amazonaws.comSelect a log line format: Blank. Otherwise the JSON format will be corrupted.Gzip level: 9. The best compression to save the storage size.
- Click
CREATEto finish the setup process.
Next Step
- Now you are ready to receive beacons. You can install Ingestly Client JavaScript to your website.