{"nbformat":4,"nbformat_minor":0,"metadata":{"colab":{"name":"Exploring Chronicle's Datalake (BigQuery)","provenance":[],"collapsed_sections":["jV0teq1GcKM8","QyRgTNT4VE0c","PskCEQhukEAk","U8_aus31cqdg"],"toc_visible":true},"kernelspec":{"name":"python3","display_name":"Python 3"},"language_info":{"name":"python"}},"cells":[{"cell_type":"markdown","source":["# Chronicle Data Lake"],"metadata":{"id":"QJm0mraVVBOX"}},{"cell_type":"markdown","source":["## Overview\n","This Notebook is an interactive tutorial focused on learning Chronicle's Data Lake. The Chronicle Data Lake is a Google managed instance of [GCP BigQuery](https://cloud.google.com/bigquery/) and commonly used for:\n","
    \n","
  1. Dashboards - such as Google Cloud Looker
  2. \n","
  3. SOAR - such as Google Cloud Siemplify or PAN XSOAR.
  4. \n","
  5. SQL - advanced queries of UDM event data
  6. \n","
  7. BQML - advanced Machine Learning capabilities
  8. \n","
\n","\n","This Notebook will provide an overview of the data available in the Chronicle Datalake, as well as focus on the SQL use case. However, it should be noted, this isn't a general purpose training for learning SQL, and some basic working knowledge of SQL is required."],"metadata":{"id":"zlNPgx7Gr-nU"}},{"cell_type":"markdown","source":["## What is BigQuery?\n","\n","> BigQuery is a fully managed enterprise data warehouse that helps you manage and analyze your data with built-in features like machine learning, geospatial analysis, and business intelligence. BigQuery's serverless architecture lets you use SQL queries to answer your organization's biggest questions with zero infrastructure management. BigQuery's scalable, distributed analysis engine lets you query terabytes in seconds and petabytes in minutes.\n","\n","Read more...\n","\n","Prefer not to read on? Click the short video below to learn more on BigQuery."],"metadata":{"id":"0AexTUsoatgo"}},{"cell_type":"code","source":["from IPython.display import HTML\n","\n","# Youtube\n","HTML('')\n"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":336},"id":"GRKn43sza7kx","outputId":"e7fb1215-2812-4665-af85-ff2fb6172777","executionInfo":{"status":"ok","timestamp":1653400902856,"user_tz":-120,"elapsed":7,"user":{"displayName":"Ed Carolan","userId":"06748390582029345284"}}},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[""],"text/html":[""]},"metadata":{},"execution_count":1}]},{"cell_type":"markdown","source":["## Using the Cloud Console\n","\n","The Google Cloud Console provides a graphical interface that you can use to create and manage BigQuery resources and run SQL queries. This notebook uses the BigQuery API, but you can run the below examples in the Chronicle Cloud Console if you wish.\n","\n","> For more information about working with BigQuery by using the Google Cloud Console, see Cloud Console quickstart.\n","\n","Read more..."],"metadata":{"id":"jV0teq1GcKM8"}},{"cell_type":"markdown","source":["## Accessing the Chronicle Data Lake\n","Access to the Chronicle Data Lake can be via either:\n"," \n","\n","If you don't have access credentials to your Chronicle Datalake, please contact Google Chronicle support. \n","\n","For the purposes of this Notebook JSON Service Account Credentials are used. \n","\n"],"metadata":{"id":"QyRgTNT4VE0c"}},{"cell_type":"markdown","source":["---\n","# Notebook Setup\n","\n","Complete the following sections in order to use this Notebook interactively. If you're unsure of any configurations or are missing credentials please contact your Chronicle account team or Chronicle Partner."],"metadata":{"id":"q7V42F9PHMtj"}},{"cell_type":"code","source":["# configure your Chronicle Data Lake project name, e.g., chronicle-\n","PROJECT_ID = 'chronicle-coe'"],"metadata":{"id":"VJKwgYWkuugb"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["# helper functions used throughout the Notebook\n","def run_query(query):\n"," import pandas as pd\n"," df = pd.read_gbq(query, project_id=PROJECT_ID, dialect='standard', credentials=CREDENTIALS)\n"," return df"],"metadata":{"id":"dzBvtHSQuzHD"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["from google.cloud import bigquery\n","from google.oauth2 import service_account\n","\n","# upload your Chronicle BigQuery Service Account credentials, and change the keypath accordingly\n","key_path = '/content/chronicle-coe-e3542591a1e7.json'\n","\n","CREDENTIALS = service_account.Credentials.from_service_account_file(\n"," key_path, scopes=[\"https://www.googleapis.com/auth/cloud-platform\"],\n",")\n","\n","client = bigquery.Client(credentials=CREDENTIALS, project=PROJECT_ID,)"],"metadata":{"id":"kAZi3aSvulmR"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["---"],"metadata":{"id":"172hnmHc7a0Q"}},{"cell_type":"markdown","source":["# BigQuery Best Practices\n","\n","Before we get started, here's a few SQL best practices that can help with performance (and costs):"],"metadata":{"id":"pyfM9sEG7cY2"}},{"cell_type":"markdown","source":["\n","\n","1. Don't use the `*` wildcard as a default in your queries, but rather only if needed. Returning more data than you can makes queries take longer.\n","2. Use `LIMIT` statements to not return more data than required, e.g., on a Top X aggregate query if you only need a top 10.\n","3. To improve query speed use `_PARTITIONTIME` in your `WHERE` clauses to reduce the range of data queried. More on this in using Date Functions later on.\n","\n","*References:*\n","\n","- [BigQuery Cost Best Practices](https://cloud.google.com/bigquery/docs/best-practices-costs)\n"],"metadata":{"id":"KWGqlYSp8KVz"}},{"cell_type":"markdown","source":["# Data Lake FAQs\n","\n","

\n","How long is data retained in the Chronicle Datalake?
\n","Six months.\n","

\n","\n","

\n","Can I keep data longer than six months?
\n","Yes, but you need to backup and export the data into your own BigQuery instance, e.g., a scheduled query / backup and restore.\n","

\n","\n","

\n","Can I Join my datasets against the Chronicle Datalake?
\n","Yes.\n","

\n","\n","

\n","Can I backup a specific Datalake table?\n","\n","Yes, you can follow instructions here on different ways to backup a Datalake table - [exporting-data](https://cloud.google.com/bigquery/docs/exporting-data)\n","

\n","\n","

\n","Can I backup the Data Lake dataset?\n","\n","Yes, you can follow instructions here - [copying datasets](https://cloud.google.com/bigquery/docs/copying-datasets)\n","

\n","\n"],"metadata":{"id":"9lqT_n8CCAGs"}},{"cell_type":"markdown","source":["---"],"metadata":{"id":"qkNnuoMDw32v"}},{"cell_type":"markdown","source":["# Data Lake Dataset & Tables Overview\n","\n","The Chronicle Data Lake provides a default BigQuery Dataset named `datalake`. Run the below SQL to list all the available Tables in the `datalake` Dataset :"],"metadata":{"id":"TUYuKagXw-fp"}},{"cell_type":"code","source":["sql_list_dataset_schema = \"\"\"\n","SELECT \n"," table_name,\n"," FORMAT_DATE(\"%F\",creation_time) AS creation_time\n","FROM \n"," datalake.INFORMATION_SCHEMA.TABLES\n","ORDER BY\n"," table_name ASC;\n","\"\"\""],"metadata":{"id":"KIjXRizzs0w1"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_list_dataset_schema)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":363},"id":"VKFrSeoMtAGE","outputId":"87ec5e17-6aae-4fcb-8320-955a5b02fd5a"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" table_name creation_time\n","0 entity_enum_value_to_name_mapping 2022-02-03\n","1 entity_graph 2022-02-02\n","2 ingestion_metrics 2022-04-19\n","3 ingestion_stats 2021-10-05\n","4 ioc_matches 2021-10-05\n","5 job_metadata 2021-10-05\n","6 rule_detections 2021-10-05\n","7 udm_enum_value_to_name_mapping 2021-10-05\n","8 udm_events 2021-10-05\n","9 udm_events_aggregates 2021-10-15"],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
table_namecreation_time
0entity_enum_value_to_name_mapping2022-02-03
1entity_graph2022-02-02
2ingestion_metrics2022-04-19
3ingestion_stats2021-10-05
4ioc_matches2021-10-05
5job_metadata2021-10-05
6rule_detections2021-10-05
7udm_enum_value_to_name_mapping2021-10-05
8udm_events2021-10-05
9udm_events_aggregates2021-10-15
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":8}]},{"cell_type":"markdown","source":["### Table Definitions\n","\n","What do these tables all do?\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","\n","
TableDescriptionPartitioned?
entity_graphExport of UDM context data, e.g., Assets, IOCs, Resources, Users. Yes
entity_enum_value_to_name_mappingLookup table for normalized values in the entity_graph table.No
ingestion_metricsTelemetry relating to log sources, both Chronicle SAS, Chronicle API, and Chronicle Forwarder related. This is the latest and recommended way for getting insights into your Chronicle log sources (over the prior ingestion_stats table).No
ingestion_statsTelemetry relating to log sources, both Chronicle SAS, Chronicle API, and Chronicle Forwarder related.No
ioc_matchesIOC matches for types of IP or Domain, for both Chronicle default customer (3rd party) CTI sources.No
job_metadataAudits export of UDM Event and Entity data to BigQuery. Not really very useful.No
rule_detectionsExport of Detection Engine detections. Both Live rules and Retro Hunts will populate this table.No
udm_enum_value_to_name_mappingLookup table for normalized values in the udm_events table.No
udm_eventsExport of UDM Event data. Note, this is not enabled by default and requires a support request in order to enable this table. Retains 6 months worth of UDM Event data.Yes
udm_events_aggregatesAn aggregate summary table for Authentication activity. Used by the embedded Looker dashboard 'User Signin Overview'.Yes
"],"metadata":{"id":"H-eQrIHBJCRw"}},{"cell_type":"markdown","source":["---"],"metadata":{"id":"PluWEbbrxFRe"}},{"cell_type":"markdown","source":["# Using Aggregate Functions\n","\n","Aggregate function perform calculations on multiple values and returns a single value."],"metadata":{"id":"m7nT4uW5FqM2"}},{"cell_type":"markdown","source":["## MIN & MAX (First & Last Observed)\n","\n","A common query requirement is determining the first and last observed timestamp for a set of actions, e.g., the first and last time a hash was observed.\n","\n","This can be achieved via BigQuery using the Aggregate functions MIN and MAX.\n","\n","The below query uses a variable called `__USER__` to perform a wildcard (regex) match for the first and last observed event times for a given user."],"metadata":{"id":"9AEn6v3iFvNf"}},{"cell_type":"code","source":["sql_first_last_observed = \"\"\"\n","DECLARE\n"," __USER__ STRING DEFAULT \"system\";\n","SELECT\n"," principal.user.userid,\n"," COUNT(1) AS count,\n"," TIMESTAMP_SECONDS(MIN(metadata.event_timestamp.seconds)) AS first_observed,\n"," TIMESTAMP_SECONDS(MAX(metadata.event_timestamp.seconds)) AS last_observed,\n","FROM\n"," `datalake.udm_events`\n","WHERE\n"," REGEXP_CONTAINS(principal.user.userid, __USER__)\n","GROUP BY\n"," 1\n","\"\"\"\n"],"metadata":{"id":"Os4J_pAWFyso"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_first_last_observed)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":478},"id":"4jGGvv1pwZ4K","outputId":"104806bb-f128-46c1-d630-266533b5ac08"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stderr","text":["/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/table.py:1582: FutureWarning: Data is timezone-aware. Converting timezone-aware data to timezone-naive by passing dtype='datetime64[ns]' to DataFrame or Series is deprecated and will raise in a future version. Use `pd.Series(values).dt.tz_localize(None)` instead.\n"," bqstorage_client=bqstorage_client, dtypes=dtypes\n"]},{"output_type":"execute_result","data":{"text/plain":[" userid count \\\n","0 system:cluster-autoscaler 4577842 \n","1 system:snapshot-controller 1459099 \n","2 system:node:gk3-gke-p-cdf-03-default-pool-2130... 20838 \n","3 system:serviceaccount:kube-system:service-acco... 4 \n","4 system:gke-common-webhooks 690433 \n",".. ... ... \n","75 system:node:gk3-gke-p-cdf-02-default-pool-e7f2... 24993 \n","76 system:node:gk3-gke-p-cdf-03-default-pool-606d... 25020 \n","77 system:kube-controller-manager 3632760 \n","78 system:node:gk3-gke-p-cdf-02-default-pool-6cc5... 526 \n","79 system:managed-certificate-controller 6913565 \n","\n"," first_observed last_observed \n","0 2022-03-12 06:00:00+00:00 2022-04-20 14:59:58+00:00 \n","1 2021-10-22 17:00:02+00:00 2022-04-20 14:59:55+00:00 \n","2 2022-03-12 06:00:03+00:00 2022-03-14 17:04:59+00:00 \n","3 2022-03-28 19:24:03+00:00 2022-03-28 19:24:03+00:00 \n","4 2022-03-12 06:00:12+00:00 2022-04-20 14:59:54+00:00 \n",".. ... ... \n","75 2021-10-22 17:00:05+00:00 2021-10-25 15:59:50+00:00 \n","76 2021-10-22 17:00:00+00:00 2021-10-25 15:59:27+00:00 \n","77 2021-10-22 17:00:00+00:00 2022-04-20 14:59:57+00:00 \n","78 2022-03-28 17:12:25+00:00 2022-03-28 18:39:22+00:00 \n","79 2021-10-22 17:00:01+00:00 2022-04-20 14:59:55+00:00 \n","\n","[80 rows x 4 columns]"],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
useridcountfirst_observedlast_observed
0system:cluster-autoscaler45778422022-03-12 06:00:00+00:002022-04-20 14:59:58+00:00
1system:snapshot-controller14590992021-10-22 17:00:02+00:002022-04-20 14:59:55+00:00
2system:node:gk3-gke-p-cdf-03-default-pool-2130...208382022-03-12 06:00:03+00:002022-03-14 17:04:59+00:00
3system:serviceaccount:kube-system:service-acco...42022-03-28 19:24:03+00:002022-03-28 19:24:03+00:00
4system:gke-common-webhooks6904332022-03-12 06:00:12+00:002022-04-20 14:59:54+00:00
...............
75system:node:gk3-gke-p-cdf-02-default-pool-e7f2...249932021-10-22 17:00:05+00:002021-10-25 15:59:50+00:00
76system:node:gk3-gke-p-cdf-03-default-pool-606d...250202021-10-22 17:00:00+00:002021-10-25 15:59:27+00:00
77system:kube-controller-manager36327602021-10-22 17:00:00+00:002022-04-20 14:59:57+00:00
78system:node:gk3-gke-p-cdf-02-default-pool-6cc5...5262022-03-28 17:12:25+00:002022-03-28 18:39:22+00:00
79system:managed-certificate-controller69135652021-10-22 17:00:01+00:002022-04-20 14:59:55+00:00
\n","

80 rows × 4 columns

\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":16}]},{"cell_type":"markdown","source":["## COUNT (Tallying Events) \n","\n","A common requirement is understanding the total number of events that has happened for a given set of activities. The Aggregate function count can be used to achieve this.\n","\n","In the below example a count of all `USER_LOGIN` UDM events is calculated, and the `COUNT` function is called upon the `target.user.userid` UDM path.**bold text**"],"metadata":{"id":"nNpPhi7Zyabq"}},{"cell_type":"code","source":["sql_count_user_logon = \"\"\"\n","DECLARE\n"," __USER__ STRING DEFAULT \"admin\";\n","SELECT\n"," e.target.user.userid,\n"," l.enum_name as event_type,\n"," COUNT(1) AS count,\n"," TIMESTAMP_SECONDS(MIN(e.metadata.event_timestamp.seconds)) AS first_observed,\n"," TIMESTAMP_SECONDS(MAX(e.metadata.event_timestamp.seconds)) AS last_observed,\n","FROM\n"," `datalake.udm_events` e --event data\n","INNER JOIN `datalake.udm_enum_value_to_name_mapping` l --enum lookup\n","ON e.metadata.event_type = l.enum_value\n","WHERE\n"," REGEXP_CONTAINS(e.target.user.userid, __USER__)\n"," AND l.enum_name = \"USER_LOGIN\"\n","GROUP BY\n"," 1,2\n","ORDER BY \n"," 1,3 DESC\n","\"\"\""],"metadata":{"id":"RRZq3ETnyPa_"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_count_user_logon)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":205},"id":"w7JQ6wLeYrm-","outputId":"548b640f-8cf1-4795-adb7-54346a507e95"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stderr","text":["/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/table.py:1582: FutureWarning: Data is timezone-aware. Converting timezone-aware data to timezone-naive by passing dtype='datetime64[ns]' to DataFrame or Series is deprecated and will raise in a future version. Use `pd.Series(values).dt.tz_localize(None)` instead.\n"," bqstorage_client=bqstorage_client, dtypes=dtypes\n"]},{"output_type":"execute_result","data":{"text/plain":[" userid event_type count \\\n","0 admin USER_LOGIN 189 \n","1 admin_1823127835827_altostrat_co USER_LOGIN 120 \n","\n"," first_observed last_observed \n","0 2021-10-02 10:18:03+00:00 2022-03-30 08:34:55+00:00 \n","1 2021-10-02 07:04:23+00:00 2022-03-30 12:15:53+00:00 "],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
useridevent_typecountfirst_observedlast_observed
0adminUSER_LOGIN1892021-10-02 10:18:03+00:002022-03-30 08:34:55+00:00
1admin_1823127835827_altostrat_coUSER_LOGIN1202021-10-02 07:04:23+00:002022-03-30 12:15:53+00:00
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":8}]},{"cell_type":"markdown","source":["An extended example of the above query would be to calculate the count of `USER_LOGIN` events per day.\n","\n","The use of the `DATE_TRUNC` Datetime function can be used to group data daily.\n"],"metadata":{"id":"P8Iq523Xh2nd"}},{"cell_type":"code","source":["sql_count_user_logon_per_day = \"\"\"\n","DECLARE\n"," __USER__ STRING DEFAULT \"admin\";\n","SELECT\n"," e.target.user.userid,\n"," l.enum_name AS event_type,\n"," COUNT(1) AS count,\n"," DATE_TRUNC(DATE(TIMESTAMP_SECONDS(e.metadata.event_timestamp.seconds)), DAY) AS day\n","FROM\n"," `datalake.udm_events` e --event data\n","INNER JOIN\n"," `datalake.udm_enum_value_to_name_mapping` l --enum lookup\n","ON\n"," e.metadata.event_type = l.enum_value\n","WHERE\n"," l.enum_name = \"USER_LOGIN\"\n"," AND e.target.user.userid = __USER__\n","GROUP BY\n"," 1,\n"," 2,\n"," 4\n","ORDER BY\n"," day DESC\n","LIMIT 30\n","\"\"\""],"metadata":{"id":"SBdPQb_th5It"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_count_user_logon_per_day)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":802},"id":"96B9OkNzigJs","outputId":"ad48a029-db4b-4e86-9ad4-812462d0236a"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" userid event_type count day\n","0 admin USER_LOGIN 10 2022-03-30\n","1 admin USER_LOGIN 22 2022-03-29\n","2 admin USER_LOGIN 24 2022-03-28\n","3 admin USER_LOGIN 6 2022-03-27\n","4 admin USER_LOGIN 6 2022-03-26\n","5 admin USER_LOGIN 6 2022-03-25\n","6 admin USER_LOGIN 6 2022-03-24\n","7 admin USER_LOGIN 12 2022-03-23\n","8 admin USER_LOGIN 6 2022-03-22\n","9 admin USER_LOGIN 6 2022-03-21\n","10 admin USER_LOGIN 6 2022-03-20\n","11 admin USER_LOGIN 6 2022-03-19\n","12 admin USER_LOGIN 6 2022-03-18\n","13 admin USER_LOGIN 6 2022-03-17\n","14 admin USER_LOGIN 6 2022-03-16\n","15 admin USER_LOGIN 6 2022-03-15\n","16 admin USER_LOGIN 6 2022-03-14\n","17 admin USER_LOGIN 6 2022-03-13\n","18 admin USER_LOGIN 6 2022-03-12\n","19 admin USER_LOGIN 6 2021-10-18\n","20 admin USER_LOGIN 1 2021-10-07\n","21 admin USER_LOGIN 15 2021-10-05\n","22 admin USER_LOGIN 8 2021-10-04\n","23 admin USER_LOGIN 1 2021-10-02"],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
useridevent_typecountday
0adminUSER_LOGIN102022-03-30
1adminUSER_LOGIN222022-03-29
2adminUSER_LOGIN242022-03-28
3adminUSER_LOGIN62022-03-27
4adminUSER_LOGIN62022-03-26
5adminUSER_LOGIN62022-03-25
6adminUSER_LOGIN62022-03-24
7adminUSER_LOGIN122022-03-23
8adminUSER_LOGIN62022-03-22
9adminUSER_LOGIN62022-03-21
10adminUSER_LOGIN62022-03-20
11adminUSER_LOGIN62022-03-19
12adminUSER_LOGIN62022-03-18
13adminUSER_LOGIN62022-03-17
14adminUSER_LOGIN62022-03-16
15adminUSER_LOGIN62022-03-15
16adminUSER_LOGIN62022-03-14
17adminUSER_LOGIN62022-03-13
18adminUSER_LOGIN62022-03-12
19adminUSER_LOGIN62021-10-18
20adminUSER_LOGIN12021-10-07
21adminUSER_LOGIN152021-10-05
22adminUSER_LOGIN82021-10-04
23adminUSER_LOGIN12021-10-02
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":12}]},{"cell_type":"markdown","source":["## AVG\n","\n","The AVG Aggregate function can be used to calculate the average of non-null inputs.\n","\n","In the below example correlated subquery (or nested query) is used to calculate the `COUNT` of logins per day for a given user, then an Average is applied to calculate the average login activity."],"metadata":{"id":"PskCEQhukEAk"}},{"cell_type":"code","source":["sql_average_logins_per_day = \"\"\"\n","DECLARE\n"," __USER__ STRING DEFAULT \"admin\";\n","SELECT\n"," AVG(count) AS avg_logins_per_day\n","FROM (\n"," SELECT\n"," e.target.user.userid,\n"," l.enum_name AS event_type,\n"," COUNT(1) AS count,\n"," DATE_TRUNC(DATE(TIMESTAMP_SECONDS(e.metadata.event_timestamp.seconds)), DAY) AS day\n"," FROM\n"," `datalake.udm_events` e --event data\n"," INNER JOIN\n"," `datalake.udm_enum_value_to_name_mapping` l --enum lookup\n"," ON\n"," e.metadata.event_type = l.enum_value\n"," WHERE\n"," l.enum_name = \"USER_LOGIN\"\n"," AND e.target.user.userid = __USER__\n"," GROUP BY\n"," 1,\n"," 2,\n"," 4\n"," ORDER BY\n"," day ASC\n"," LIMIT\n"," 30 )\n","\"\"\""],"metadata":{"id":"pzOan4KYkFaJ"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_average_logins_per_day)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":81},"id":"YydGJxLel2pY","outputId":"6e846840-4713-455c-c65d-187f6b25a34a"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" avg_logins_per_day\n","0 7.875"],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
avg_logins_per_day
07.875
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":14}]},{"cell_type":"markdown","source":["## SUM\n","\n","Chronicle's UDM stores most metadata as type String, i.e., text; however, where applicable certain values are stored as Numeric, e.g., Bytes.\n","\n","You can use the BigQuery SUM aggregate function.\n","\n","In this example query network bytes sent and received are summed, grouped by sending device, i.e., `principal.ip`.\n","\n","While network usage from log data is normalized to Bytes in UDM, for the purposes of making it easier to read you can use additional arithmetic operators."],"metadata":{"id":"U8_aus31cqdg"}},{"cell_type":"code","source":["sql_sum_network_traffic = \"\"\"\n","DECLARE\n"," __UNIT__ INT64;\n","SET\n"," __UNIT__ = 1048576; --MB\n","SELECT\n"," TO_JSON_STRING(principal.ip) AS principal_ip,\n"," CEILING(SUM(network.sent_bytes / __UNIT__)) AS traffic_out_mb,\n"," CEILING(SUM(network.received_bytes / __UNIT__)) AS traffic_in_mb,\n","FROM\n"," `datalake.udm_events` e --event data\n","INNER JOIN\n"," `datalake.udm_enum_value_to_name_mapping` l --enum lookup\n","ON\n"," e.metadata.event_type = l.enum_value\n","WHERE\n"," l.enum_name = \"NETWORK_CONNECTION\"\n","GROUP BY\n"," 1\n","HAVING\n"," traffic_in_mb > 1\n","ORDER BY\n"," 2 DESC\n","\"\"\""],"metadata":{"id":"j1cy6D9bcsyS"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_sum_network_traffic)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":332},"id":"ldYZlHRyquWp","outputId":"60a1991a-b8a7-4a1d-fbe2-04568331e8eb"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" principal_ip traffic_out_mb traffic_in_mb\n","0 [\"172.16.164.3\"] 6849.0 1202.0\n","1 [\"172.16.166.3\"] 3467.0 230.0\n","2 [\"10.166.0.3\"] 953.0 36.0\n","3 [\"10.164.0.9\"] 608.0 755.0\n","4 [\"10.166.0.11\"] 458.0 567.0\n","5 [\"10.164.0.26\"] 122.0 53.0\n","6 [\"10.164.0.28\"] 74.0 3.0\n","7 [\"172.16.164.4\"] 16.0 21.0\n","8 [\"172.16.166.4\"] 13.0 16.0"],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
principal_iptraffic_out_mbtraffic_in_mb
0[\"172.16.164.3\"]6849.01202.0
1[\"172.16.166.3\"]3467.0230.0
2[\"10.166.0.3\"]953.036.0
3[\"10.164.0.9\"]608.0755.0
4[\"10.166.0.11\"]458.0567.0
5[\"10.164.0.26\"]122.053.0
6[\"10.164.0.28\"]74.03.0
7[\"172.16.164.4\"]16.021.0
8[\"172.16.166.4\"]13.016.0
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":19}]},{"cell_type":"markdown","source":["# Using Date & Time Functions"],"metadata":{"id":"lgUSz5a7xqjB"}},{"cell_type":"markdown","source":["## TIMESTAMP_SECONDS"],"metadata":{"id":"Xn2gGmdqGu0C"}},{"cell_type":"markdown","source":["Often there will be a requirement to perform precise data calculations within your queries, for example finding events between a specific set of times, or finding the first and last observed.\n","\n","Let's revisit the prior example for finding the first and last timestamp for a set of activities. This is acheived by using the [TIMESTAMP_SECONDS](https://cloud.google.com/bigquery/docs/reference/standard-sql/timestamp_functions) function to conver the UTC epoch timestamp, an integer, into a usable date field.\n","\n","Try this quick example "],"metadata":{"id":"Sax0C48a3c_f"}},{"cell_type":"code","source":["sql_timestamp_function_1 = \"\"\"\n","DECLARE\n"," __HASH__ STRING DEFAULT \"6b481d656414c50d8bd0bedcd615aeaf2f5f68576cb6732a9548e0da87729733\";\n","SELECT\n"," principal.hostname,\n"," COUNT(1) AS count,\n"," TIMESTAMP_SECONDS(MIN(metadata.event_timestamp.seconds)) AS first_observed,\n"," TIMESTAMP_SECONDS(MAX(metadata.event_timestamp.seconds)) AS last_observed,\n","FROM\n"," `datalake.udm_events`\n","WHERE\n"," target.process.file.sha256 = __HASH__\n","GROUP BY\n"," 1\n","\"\"\"\n"],"metadata":{"id":"XYapJJn4CmZ1"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_timestamp_function_1)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":136},"id":"lAynP4qID89C","outputId":"3ed5a5e9-ba1f-4e22-cc48-9b5a72e9efba"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stderr","text":["/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/table.py:1582: FutureWarning: Data is timezone-aware. Converting timezone-aware data to timezone-naive by passing dtype='datetime64[ns]' to DataFrame or Series is deprecated and will raise in a future version. Use `pd.Series(values).dt.tz_localize(None)` instead.\n"," bqstorage_client=bqstorage_client, dtypes=dtypes\n"]},{"output_type":"execute_result","data":{"text/plain":[" hostname count first_observed \\\n","0 win-jb-01.ad.1823127835827.altostrat.com 24673 2022-03-12 06:05:45+00:00 \n","\n"," last_observed \n","0 2022-04-20 14:49:54+00:00 "],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
hostnamecountfirst_observedlast_observed
0win-jb-01.ad.1823127835827.altostrat.com246732022-03-12 06:05:45+00:002022-04-20 14:49:54+00:00
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":28}]},{"cell_type":"markdown","source":["## DATEDIFF\n","\n","Building from the last query, you can use the [DATEDIFF](https://cloud.google.com/bigquery/docs/reference/standard-sql/datetime_functions#datetime_diff) function to find the number of Days between the first and last obvserved timestamps. The function can return the interval in from Microseconds to Years."],"metadata":{"id":"IU4GhZEfVP0Y"}},{"cell_type":"code","source":["sql_timestamp_datediff = \"\"\"\n","DECLARE\n"," __HASH__ STRING DEFAULT \"6b481d656414c50d8bd0bedcd615aeaf2f5f68576cb6732a9548e0da87729733\";\n","SELECT\n"," principal.hostname,\n"," COUNT(1) AS count,\n"," TIMESTAMP_SECONDS(MIN(metadata.event_timestamp.seconds)) AS first_observed,\n"," TIMESTAMP_SECONDS(MAX(metadata.event_timestamp.seconds)) AS last_observed,\n"," DATETIME_DIFF(DATETIME(TIMESTAMP_SECONDS(MIN(metadata.event_timestamp.seconds))), DATETIME(TIMESTAMP_SECONDS(MAX(metadata.event_timestamp.seconds))), DAY) AS interval_duration\n","FROM\n"," `datalake.udm_events`\n","WHERE\n"," target.process.file.sha256 = __HASH__\n","GROUP BY\n"," 1\n","\"\"\"\n"],"metadata":{"id":"_dgDutkgVP_c"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_timestamp_datediff)"],"metadata":{"id":"sOEF71eyXs1f","outputId":"7d2e1437-e370-4a81-8648-1aeab91869c8","colab":{"base_uri":"https://localhost:8080/","height":136}},"execution_count":null,"outputs":[{"output_type":"stream","name":"stderr","text":["/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/table.py:1582: FutureWarning: Data is timezone-aware. Converting timezone-aware data to timezone-naive by passing dtype='datetime64[ns]' to DataFrame or Series is deprecated and will raise in a future version. Use `pd.Series(values).dt.tz_localize(None)` instead.\n"," bqstorage_client=bqstorage_client, dtypes=dtypes\n"]},{"output_type":"execute_result","data":{"text/plain":[" hostname count first_observed \\\n","0 win-jb-01.ad.1823127835827.altostrat.com 24722 2022-03-12 06:05:45+00:00 \n","\n"," last_observed interval_duration \n","0 2022-04-20 16:59:27+00:00 -39 "],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
hostnamecountfirst_observedlast_observedinterval_duration
0win-jb-01.ad.1823127835827.altostrat.com247222022-03-12 06:05:45+00:002022-04-20 16:59:27+00:00-39
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":16}]},{"cell_type":"markdown","source":["## FORMAT_DATE\n","\n","For times where you may not wish to work with or display an epoch or ISO8601 formatted timestamp, you can use the [FORMAT_DATE](https://cloud.google.com/bigquery/docs/reference/standard-sql/date_functions#format_date) function to display a timestamp as required (full list of output formats [here](https://cloud.google.com/bigquery/docs/reference/standard-sql/format-elements#format_elements_date_time))."],"metadata":{"id":"ewIzN37dVTEU"}},{"cell_type":"code","source":["sql_timestamp_format_date = \"\"\"\n","DECLARE\n"," __HASH__ STRING DEFAULT \"6b481d656414c50d8bd0bedcd615aeaf2f5f68576cb6732a9548e0da87729733\";\n","SELECT\n"," principal.hostname,\n"," COUNT(1) AS count,\n"," FORMAT_DATE(\"%F\", TIMESTAMP_SECONDS(MIN(metadata.event_timestamp.seconds))) AS creation_time,\n"," FORMAT_DATE(\"%F\", TIMESTAMP_SECONDS(MAX(metadata.event_timestamp.seconds))) AS last_observed,\n"," DATETIME_DIFF(DATETIME(TIMESTAMP_SECONDS(MIN(metadata.event_timestamp.seconds))), DATETIME(TIMESTAMP_SECONDS(MAX(metadata.event_timestamp.seconds))), DAY) AS interval_duration\n","FROM\n"," `datalake.udm_events`\n","WHERE\n"," target.process.file.sha256 = __HASH__\n","GROUP BY\n"," 1\n","\"\"\""],"metadata":{"id":"nnqRSA75VTL9"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_timestamp_format_date)"],"metadata":{"id":"pj5Y50xFbRXN","outputId":"fe0eace5-e370-4ec9-8920-642cd888c237","colab":{"base_uri":"https://localhost:8080/","height":81}},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" hostname count creation_time \\\n","0 win-jb-01.ad.1823127835827.altostrat.com 24722 2022-03-12 \n","\n"," last_observed interval_duration \n","0 2022-04-20 -39 "],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
hostnamecountcreation_timelast_observedinterval_duration
0win-jb-01.ad.1823127835827.altostrat.com247222022-03-122022-04-20-39
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":19}]},{"cell_type":"markdown","source":["## Using PARTITIONS to optimize query time and narrow results.\n","\n","There are several ways to apply PARTITION filters to a SQL query\n","\n",""],"metadata":{"id":"6j9sGKpic9AL"}},{"cell_type":"code","source":["sql_using_partitions_example_1=\"\"\"\n","SELECT\n"," log_type,\n"," SUM(normalized_event_count) AS normalized_event_count,\n"," SUM(parsing_error_count) AS parsing_error_count\n","FROM\n"," `datalake.ingestion_stats`\n","WHERE\n"," DATE(_PARTITIONTIME) > DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY)\n","GROUP BY\n"," log_type\n","ORDER BY\n"," 2 DESC\n","\"\"\""],"metadata":{"id":"DILbPPsZc9Gh"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_using_partitions_example_1)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":771},"id":"m2gydc4QdQ7z","outputId":"58eea524-41b0-4d0e-eb4c-5cf24cbe8e58"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" log_type normalized_events error_events\n","0 GCP_CLOUDAUDIT 8312725 0\n","1 GCP_FIREWALL 3032316 0\n","2 GCP_DNS 2838899 0\n","3 WINEVTLOG 169318 0\n","4 WINDOWS_SYSMON 41456 0\n","5 GCP_CLOUD_NAT 38688 0\n","6 WORKSPACE_ACTIVITY 28544 0\n","7 NIX_SYSTEM 2718 0\n","8 POWERSHELL 2461 0\n","9 GCP_SECURITYCENTER 2138 0\n","10 WINDOWS_DEFENDER_AV 766 0\n","11 GCP_BIGQUERY_CONTEXT 707 0\n","12 GCP_COMPUTE_CONTEXT 399 0\n","13 GCP_LOADBALANCING 388 0\n","14 GCP_IAM_ANALYSIS 368 0\n","15 GCP_IAM_CONTEXT 247 0\n","16 GCP_STORAGE_CONTEXT 228 0\n","17 WORKSPACE_USERS 132 0\n","18 GCP_DLP_CONTEXT 108 0\n","19 WINDOWS_AD 58 0\n","20 UDM 17 0\n","21 CATCH_ALL 10 0\n","22 FORWARDER_HEARTBEAT 0 0"],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
log_typenormalized_eventserror_events
0GCP_CLOUDAUDIT83127250
1GCP_FIREWALL30323160
2GCP_DNS28388990
3WINEVTLOG1693180
4WINDOWS_SYSMON414560
5GCP_CLOUD_NAT386880
6WORKSPACE_ACTIVITY285440
7NIX_SYSTEM27180
8POWERSHELL24610
9GCP_SECURITYCENTER21380
10WINDOWS_DEFENDER_AV7660
11GCP_BIGQUERY_CONTEXT7070
12GCP_COMPUTE_CONTEXT3990
13GCP_LOADBALANCING3880
14GCP_IAM_ANALYSIS3680
15GCP_IAM_CONTEXT2470
16GCP_STORAGE_CONTEXT2280
17WORKSPACE_USERS1320
18GCP_DLP_CONTEXT1080
19WINDOWS_AD580
20UDM170
21CATCH_ALL100
22FORWARDER_HEARTBEAT00
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":58}]},{"cell_type":"markdown","source":["# Working with Repeated Fields\n","\n","Repeated fields are a powerful concept within BigQuery, and beyond the scope of what we can cover in this tutorial; however, what we shall cover below is:\n","\n","
    \n","
  1. where Chronicle's Data Lake makes use of repeated fields in UDM
  2. \n","
  3. how to use Repeated fields in a SQL query
  4. \n","
\n","\n","For more details on Repeated fields see Working with Joins, Nested & Repeated Data."],"metadata":{"id":"zgrQhmcYyJPS"}},{"cell_type":"markdown","source":["Querying a repeated field is as simple as calling the field name, e.g., to return all target IPs run the below SQL:"],"metadata":{"id":"7Cv6SRjp413U"}},{"cell_type":"code","source":["sql_repeated_field_example_1 = \"\"\"\n","SELECT\n"," target.ip\n","FROM\n"," `datalake.udm_events` e --event data\n","INNER JOIN\n"," `datalake.udm_enum_value_to_name_mapping` l --enum lookup\n","ON\n"," e.metadata.event_type = l.enum_value\n","WHERE\n"," l.enum_name = \"NETWORK_DNS\"\n"," AND target.ip IS NOT NULL\n","LIMIT\n"," 5\n","\"\"\""],"metadata":{"id":"VQv4xQsmJaMY"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_repeated_field_example_1)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":206},"id":"WGdMKl05Jeme","outputId":"2b4636ff-be0e-4204-f94c-138b42f6af71"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" ip\n","0 [10.99.1.3]\n","1 [10.99.1.67]\n","2 [10.99.1.67]\n","3 [10.99.1.3]\n","4 [10.99.1.67]"],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
ip
0[10.99.1.3]
1[10.99.1.67]
2[10.99.1.67]
3[10.99.1.3]
4[10.99.1.67]
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":25}]},{"cell_type":"markdown","source":["However, when running a query with Aggregate functions you can encounter errors. \n","\n","Take the following query which, while this would look like it should work, running it as is will return the following error:\n","```\n","Grouping by expressions of type ARRAY is not allowed at [16:3]\n","```"],"metadata":{"id":"8bYXyfeAJd7v"}},{"cell_type":"code","source":["sql_repeated_field_example_2 = \"\"\"\n","SELECT\n"," principal.ip,\n"," target.ip,\n"," target.port,\n"," COUNT(*) AS count\n","FROM\n"," `datalake.udm_events` e --event data\n","INNER JOIN\n"," `datalake.udm_enum_value_to_name_mapping` l --enum lookup\n","ON\n"," e.metadata.event_type = l.enum_value\n","WHERE\n"," l.enum_name = \"NETWORK_DNS\"\n","GROUP BY\n"," 1,\n"," 2,\n"," 2\n","ORDER BY\n"," count DESC\n","LIMIT\n"," 10\n","\"\"\""],"metadata":{"id":"5_GaGrwJI_kh"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_repeated_field_example_2)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":1000},"id":"fWe05ZHwJ96R","outputId":"fe13192a-8789-4b29-e1dc-8efde5001e27"},"execution_count":null,"outputs":[{"output_type":"error","ename":"GenericGBQException","evalue":"ignored","traceback":["\u001b[0;31m---------------------------------------------------------------------------\u001b[0m","\u001b[0;31mBadRequest\u001b[0m Traceback (most recent call last)","\u001b[0;32m/usr/local/lib/python3.7/dist-packages/pandas_gbq/gbq.py\u001b[0m in \u001b[0;36m_download_results\u001b[0;34m(self, query_job, max_results, progress_bar_type)\u001b[0m\n\u001b[1;32m 549\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 550\u001b[0;31m \u001b[0mquery_job\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 551\u001b[0m \u001b[0;31m# Get the table schema, so that we can list rows.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/job.py\u001b[0m in \u001b[0;36mresult\u001b[0;34m(self, timeout, page_size, retry, max_results)\u001b[0m\n\u001b[1;32m 2973\u001b[0m \u001b[0;32mtry\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m-> 2974\u001b[0;31m \u001b[0msuper\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mQueryJob\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtimeout\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtimeout\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2975\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/job.py\u001b[0m in \u001b[0;36mresult\u001b[0;34m(self, timeout, retry)\u001b[0m\n\u001b[1;32m 767\u001b[0m \u001b[0;31m# TODO: modify PollingFuture so it can pass a retry argument to done().\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 768\u001b[0;31m \u001b[0;32mreturn\u001b[0m \u001b[0msuper\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0m_AsyncJob\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mresult\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mtimeout\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mtimeout\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 769\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.7/dist-packages/google/api_core/future/polling.py\u001b[0m in \u001b[0;36mresult\u001b[0;34m(self, timeout, retry)\u001b[0m\n\u001b[1;32m 133\u001b[0m \u001b[0;31m# Pylint doesn't recognize that this is valid in this case.\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 134\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_exception\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 135\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;31mBadRequest\u001b[0m: 400 Grouping by expressions of type ARRAY is not allowed at [16:3]\n\n(job ID: 4abeb449-0794-4384-9549-79432b02357a)\n\n -----Query Job SQL Follows----- \n\n | . | . | . | . | . |\n 1:\n 2:SELECT\n 3: principal.ip,\n 4: target.ip,\n 5: target.port,\n 6: COUNT(*) AS count\n 7:FROM\n 8: `datalake.udm_events` e --event data\n 9:INNER JOIN\n 10: `datalake.udm_enum_value_to_name_mapping` l --enum lookup\n 11:ON\n 12: e.metadata.event_type = l.enum_value\n 13:WHERE\n 14: l.enum_name = \"NETWORK_DNS\"\n 15:GROUP BY\n 16: 1,\n 17: 2,\n 18: 2\n 19:ORDER BY\n 20: count DESC\n 21:LIMIT\n 22: 10\n | . | . | . | . | . |","\nDuring handling of the above exception, another exception occurred:\n","\u001b[0;31mGenericGBQException\u001b[0m Traceback (most recent call last)","\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mrun_query\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0msql_repeated_field_example_2\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m","\u001b[0;32m\u001b[0m in \u001b[0;36mrun_query\u001b[0;34m(query)\u001b[0m\n\u001b[1;32m 2\u001b[0m \u001b[0;32mdef\u001b[0m \u001b[0mrun_query\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mquery\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0;32mimport\u001b[0m \u001b[0mpandas\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m----> 4\u001b[0;31m \u001b[0mdf\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mpd\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mread_gbq\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mquery\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mproject_id\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mPROJECT_ID\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mdialect\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0;34m'standard'\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mcredentials\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mCREDENTIALS\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 5\u001b[0m \u001b[0;32mreturn\u001b[0m \u001b[0mdf\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.7/dist-packages/pandas/io/gbq.py\u001b[0m in \u001b[0;36mread_gbq\u001b[0;34m(query, project_id, index_col, col_order, reauth, auth_local_webserver, dialect, location, configuration, credentials, use_bqstorage_api, max_results, progress_bar_type)\u001b[0m\n\u001b[1;32m 192\u001b[0m \u001b[0mconfiguration\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mconfiguration\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 193\u001b[0m \u001b[0mcredentials\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mcredentials\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 194\u001b[0;31m \u001b[0;34m**\u001b[0m\u001b[0mkwargs\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 195\u001b[0m )\n\u001b[1;32m 196\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.7/dist-packages/pandas_gbq/gbq.py\u001b[0m in \u001b[0;36mread_gbq\u001b[0;34m(query, project_id, index_col, col_order, reauth, auth_local_webserver, dialect, location, configuration, credentials, use_bqstorage_api, max_results, verbose, private_key, progress_bar_type)\u001b[0m\n\u001b[1;32m 965\u001b[0m \u001b[0mconfiguration\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mconfiguration\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 966\u001b[0m \u001b[0mmax_results\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmax_results\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 967\u001b[0;31m \u001b[0mprogress_bar_type\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mprogress_bar_type\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 968\u001b[0m )\n\u001b[1;32m 969\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.7/dist-packages/pandas_gbq/gbq.py\u001b[0m in \u001b[0;36mrun_query\u001b[0;34m(self, query, max_results, progress_bar_type, **kwargs)\u001b[0m\n\u001b[1;32m 530\u001b[0m \u001b[0mquery_reply\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 531\u001b[0m \u001b[0mmax_results\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mmax_results\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 532\u001b[0;31m \u001b[0mprogress_bar_type\u001b[0m\u001b[0;34m=\u001b[0m\u001b[0mprogress_bar_type\u001b[0m\u001b[0;34m,\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 533\u001b[0m )\n\u001b[1;32m 534\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.7/dist-packages/pandas_gbq/gbq.py\u001b[0m in \u001b[0;36m_download_results\u001b[0;34m(self, query_job, max_results, progress_bar_type)\u001b[0m\n\u001b[1;32m 563\u001b[0m )\n\u001b[1;32m 564\u001b[0m \u001b[0;32mexcept\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mhttp_error\u001b[0m \u001b[0;32mas\u001b[0m \u001b[0mex\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 565\u001b[0;31m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mprocess_http_error\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mex\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 566\u001b[0m \u001b[0;32mfinally\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 567\u001b[0m \u001b[0;32mif\u001b[0m \u001b[0mbqstorage_client\u001b[0m\u001b[0;34m:\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n","\u001b[0;32m/usr/local/lib/python3.7/dist-packages/pandas_gbq/gbq.py\u001b[0m in \u001b[0;36mprocess_http_error\u001b[0;34m(ex)\u001b[0m\n\u001b[1;32m 431\u001b[0m \u001b[0;31m# `__\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 432\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 433\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mGenericGBQException\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Reason: {0}\"\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mformat\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mex\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 434\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 435\u001b[0m def run_query(\n","\u001b[0;31mGenericGBQException\u001b[0m: Reason: 400 Grouping by expressions of type ARRAY is not allowed at [16:3]\n\n(job ID: 4abeb449-0794-4384-9549-79432b02357a)\n\n -----Query Job SQL Follows----- \n\n | . | . | . | . | . |\n 1:\n 2:SELECT\n 3: principal.ip,\n 4: target.ip,\n 5: target.port,\n 6: COUNT(*) AS count\n 7:FROM\n 8: `datalake.udm_events` e --event data\n 9:INNER JOIN\n 10: `datalake.udm_enum_value_to_name_mapping` l --enum lookup\n 11:ON\n 12: e.metadata.event_type = l.enum_value\n 13:WHERE\n 14: l.enum_name = \"NETWORK_DNS\"\n 15:GROUP BY\n 16: 1,\n 17: 2,\n 18: 2\n 19:ORDER BY\n 20: count DESC\n 21:LIMIT\n 22: 10\n | . | . | . | . | . |"]}]},{"cell_type":"markdown","source":["In order to make the query run successfully it requires unnesting (UNNEST) the principal.ip field, a Structure (STRUCT). This is achieved by the following:\n","
    \n","
  • applying an UNNEST operation on the STRUCT type of principal.ip, `UNNEST(principal.ip) as principal_ip`
  • \n","
  • joining the unnested structure against the original dataset - `CROSS JOIN`
  • \n","
  • updating the SELECT statement to use the named UNNEST fields
  • \n","
"],"metadata":{"id":"fWGjulq7Mng0"}},{"cell_type":"code","source":["sql_repeated_field_example_3 = \"\"\"\n","SELECT\n"," principal_ip,\n"," target_ip,\n"," target.port AS target_port,\n"," COUNT(*) AS count\n","FROM\n"," `datalake.udm_events` e --event data\n","CROSS JOIN\n"," UNNEST(principal.ip) principal_ip\n","CROSS JOIN\n"," UNNEST(target.ip) target_ip\n","INNER JOIN\n"," `datalake.udm_enum_value_to_name_mapping` l --enum lookup\n","ON\n"," e.metadata.event_type = l.enum_value\n","WHERE\n"," l.enum_name = \"NETWORK_DNS\"\n","GROUP BY\n"," 1,\n"," 2,\n"," 3\n","ORDER BY\n"," count DESC\n","LIMIT\n"," 10\n","\"\"\""],"metadata":{"id":"Al5M5pExMn3A"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_repeated_field_example_3)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":363},"id":"UhC9AqufQN8L","outputId":"7fb11be8-d21c-4582-fd11-fb31c78864d5"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" principal_ip target_ip port count\n","0 172.16.164.3 169.254.169.254 53 306836\n","1 172.16.166.3 169.254.169.254 53 46497\n","2 10.164.0.15 10.99.1.2 53 16423\n","3 10.164.0.15 10.99.0.194 53 16133\n","4 10.164.0.12 10.99.1.2 53 15660\n","5 10.164.0.14 10.99.0.194 53 15550\n","6 10.164.0.14 10.99.1.2 53 15248\n","7 10.164.0.12 10.99.0.194 53 15132\n","8 10.166.0.10 10.105.128.6 53 12167\n","9 10.166.0.8 10.105.128.6 53 12102"],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
principal_iptarget_ipportcount
0172.16.164.3169.254.169.25453306836
1172.16.166.3169.254.169.2545346497
210.164.0.1510.99.1.25316423
310.164.0.1510.99.0.1945316133
410.164.0.1210.99.1.25315660
510.164.0.1410.99.0.1945315550
610.164.0.1410.99.1.25315248
710.164.0.1210.99.0.1945315132
810.166.0.1010.105.128.65312167
910.166.0.810.105.128.65312102
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":29}]},{"cell_type":"markdown","source":["An alternate approach to dealing with applying Aggregates is converting the STRUCT field to a JSON string."],"metadata":{"id":"QwjzyQ-4QsHY"}},{"cell_type":"code","source":["sql_repeated_field_example_4 = \"\"\"\n","SELECT\n"," TO_JSON_STRING(principal.ip) AS principal_ip,\n"," TO_JSON_STRING(target.ip) AS target_ip,\n"," target.port,\n"," COUNT(*) AS count\n","FROM\n"," `datalake.udm_events` e --event data\n","INNER JOIN\n"," `datalake.udm_enum_value_to_name_mapping` l --enum lookup\n","ON\n"," e.metadata.event_type = l.enum_value\n","WHERE\n"," l.enum_name = \"NETWORK_DNS\"\n"," AND target.ip IS NOT NULL\n","GROUP BY\n"," 1,\n"," 2,\n"," 3\n","ORDER BY\n"," count DESC\n","LIMIT\n"," 10\n","\"\"\""],"metadata":{"id":"4VUBXrvaQsQH"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_repeated_field_example_4)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":363},"id":"E95gsF8wR4iF","outputId":"478f9b01-047d-4144-9e90-226bc456b667"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" principal_ip target_ip port count\n","0 [\"172.16.164.3\"] [\"169.254.169.254\"] 53 306836\n","1 [\"172.16.166.3\"] [\"169.254.169.254\"] 53 46497\n","2 [\"10.164.0.15\"] [\"10.99.1.2\"] 53 16423\n","3 [\"10.164.0.15\"] [\"10.99.0.194\"] 53 16133\n","4 [\"10.164.0.12\"] [\"10.99.1.2\"] 53 15660\n","5 [\"10.164.0.14\"] [\"10.99.0.194\"] 53 15550\n","6 [\"10.164.0.14\"] [\"10.99.1.2\"] 53 15248\n","7 [\"10.164.0.12\"] [\"10.99.0.194\"] 53 15132\n","8 [\"10.166.0.10\"] [\"10.105.128.6\"] 53 12167\n","9 [\"10.166.0.8\"] [\"10.105.128.6\"] 53 12102"],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
principal_iptarget_ipportcount
0[\"172.16.164.3\"][\"169.254.169.254\"]53306836
1[\"172.16.166.3\"][\"169.254.169.254\"]5346497
2[\"10.164.0.15\"][\"10.99.1.2\"]5316423
3[\"10.164.0.15\"][\"10.99.0.194\"]5316133
4[\"10.164.0.12\"][\"10.99.1.2\"]5315660
5[\"10.164.0.14\"][\"10.99.0.194\"]5315550
6[\"10.164.0.14\"][\"10.99.1.2\"]5315248
7[\"10.164.0.12\"][\"10.99.0.194\"]5315132
8[\"10.166.0.10\"][\"10.105.128.6\"]5312167
9[\"10.166.0.8\"][\"10.105.128.6\"]5312102
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":32}]},{"cell_type":"markdown","source":["The final example in this section is on how to unnest and work with key value pairs (or kvps). Often in a Chronicle parser key value paris will be be used, e.g., `about.label.key` or `about.label.value`. The below example shows how to access these fields, and use multiple kvp in a WHERE clause to find specific events."],"metadata":{"id":"scW8mX_-G3PG"}},{"cell_type":"code","source":["sql_repeated_field_example_5 = \"\"\"\n","SELECT\n"," metadata.vendor_name,\n"," metadata.product_event_type,\n"," metadata.product_log_id,\n"," _labels.key,\n"," _labels.value,\n","FROM\n"," `datalake.udm_events`\n","CROSS JOIN\n"," UNNEST(about) _about\n","CROSS JOIN\n"," UNNEST(_about.labels) _labels\n","WHERE\n"," metadata.vendor_name = 'Google Workspace'\n"," AND ( (_labels.key = \"DEVICE_TYPE\"\n"," AND _labels.value = \"MAC\" )\n"," OR (_labels.key = \"DEVICE_MODEL\"\n"," AND _labels.value = \"MacBook Pro\" ) )\n","LIMIT 10\n","\"\"\""],"metadata":{"id":"u2b2idq2G3XC"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_repeated_field_example_5)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":363},"id":"OQsj2ptMG-te","outputId":"c49da85a-6221-45c4-b147-e2210bd401f3"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" vendor_name product_event_type \\\n","0 Google Workspace DEVICE_SYNC_EVENT \n","1 Google Workspace DEVICE_SYNC_EVENT \n","2 Google Workspace DEVICE_SYNC_EVENT \n","3 Google Workspace DEVICE_SYNC_EVENT \n","4 Google Workspace DEVICE_SYNC_EVENT \n","5 Google Workspace DEVICE_SYNC_EVENT \n","6 Google Workspace DEVICE_SYNC_EVENT \n","7 Google Workspace DEVICE_SYNC_EVENT \n","8 Google Workspace DEVICE_SYNC_EVENT \n","9 Google Workspace DEVICE_SYNC_EVENT \n","\n"," product_log_id key \\\n","0 eue5R_TRjiPtdcKnerniVEtzCy10bcCIj2-WRhZP5BU/Is... DEVICE_TYPE \n","1 eue5R_TRjiPtdcKnerniVEtzCy10bcCIj2-WRhZP5BU/Is... DEVICE_MODEL \n","2 HEJqifBMU2gYmfdZWFOH0h_Rp9I6ZaonX6CbHXUVaDg/1g... DEVICE_TYPE \n","3 HEJqifBMU2gYmfdZWFOH0h_Rp9I6ZaonX6CbHXUVaDg/1g... DEVICE_MODEL \n","4 0Gsen51N5R4F9HqJd4orBtsS9tovxg64orHsRiegiVY/rq... DEVICE_TYPE \n","5 0Gsen51N5R4F9HqJd4orBtsS9tovxg64orHsRiegiVY/rq... DEVICE_MODEL \n","6 0Gsen51N5R4F9HqJd4orBtsS9tovxg64orHsRiegiVY/aE... DEVICE_TYPE \n","7 0Gsen51N5R4F9HqJd4orBtsS9tovxg64orHsRiegiVY/aE... DEVICE_MODEL \n","8 0Gsen51N5R4F9HqJd4orBtsS9tovxg64orHsRiegiVY/RY... DEVICE_TYPE \n","9 0Gsen51N5R4F9HqJd4orBtsS9tovxg64orHsRiegiVY/RY... DEVICE_MODEL \n","\n"," value \n","0 MAC \n","1 MacBook Pro \n","2 MAC \n","3 MacBook Pro \n","4 MAC \n","5 MacBook Pro \n","6 MAC \n","7 MacBook Pro \n","8 MAC \n","9 MacBook Pro "],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
vendor_nameproduct_event_typeproduct_log_idkeyvalue
0Google WorkspaceDEVICE_SYNC_EVENTeue5R_TRjiPtdcKnerniVEtzCy10bcCIj2-WRhZP5BU/Is...DEVICE_TYPEMAC
1Google WorkspaceDEVICE_SYNC_EVENTeue5R_TRjiPtdcKnerniVEtzCy10bcCIj2-WRhZP5BU/Is...DEVICE_MODELMacBook Pro
2Google WorkspaceDEVICE_SYNC_EVENTHEJqifBMU2gYmfdZWFOH0h_Rp9I6ZaonX6CbHXUVaDg/1g...DEVICE_TYPEMAC
3Google WorkspaceDEVICE_SYNC_EVENTHEJqifBMU2gYmfdZWFOH0h_Rp9I6ZaonX6CbHXUVaDg/1g...DEVICE_MODELMacBook Pro
4Google WorkspaceDEVICE_SYNC_EVENT0Gsen51N5R4F9HqJd4orBtsS9tovxg64orHsRiegiVY/rq...DEVICE_TYPEMAC
5Google WorkspaceDEVICE_SYNC_EVENT0Gsen51N5R4F9HqJd4orBtsS9tovxg64orHsRiegiVY/rq...DEVICE_MODELMacBook Pro
6Google WorkspaceDEVICE_SYNC_EVENT0Gsen51N5R4F9HqJd4orBtsS9tovxg64orHsRiegiVY/aE...DEVICE_TYPEMAC
7Google WorkspaceDEVICE_SYNC_EVENT0Gsen51N5R4F9HqJd4orBtsS9tovxg64orHsRiegiVY/aE...DEVICE_MODELMacBook Pro
8Google WorkspaceDEVICE_SYNC_EVENT0Gsen51N5R4F9HqJd4orBtsS9tovxg64orHsRiegiVY/RY...DEVICE_TYPEMAC
9Google WorkspaceDEVICE_SYNC_EVENT0Gsen51N5R4F9HqJd4orBtsS9tovxg64orHsRiegiVY/RY...DEVICE_MODELMacBook Pro
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":32}]},{"cell_type":"markdown","source":["# Using Conditional Functions"],"metadata":{"id":"PAKWrjRDW8bP"}},{"cell_type":"markdown","source":["## COALESCE"],"metadata":{"id":"m9tHrFZlyqYg"}},{"cell_type":"markdown","source":["The Coalesce function enables specifying multiple fields, and using the first non-null value.\n","\n","Here's a practical example when you populate the impacted_asset field from a Struct that can contain one of several values, e.g., IP, Domain, MAC Address.\n","\n","The query is count the most impacted asset from the IOC Matches table and specifies the "],"metadata":{"id":"1a0Xswo-Vu_a"}},{"cell_type":"code","source":["sql_coalesce_example_1=\"\"\"\n","SELECT\n"," asset.hostname,\n"," asset.asset_ip_address,\n"," feed_log_type,\n"," ioc_type,\n"," category,\n"," COUNT(1) AS count\n","FROM\n"," `datalake.ioc_matches`\n","GROUP BY\n"," 1,\n"," 2,\n"," 3,\n"," 4,\n"," 5\n","ORDER BY count DESC\n","LIMIT\n"," 10\n","\"\"\""],"metadata":{"id":"fbCTLUtjVvR0"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_coalesce_example_1)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":363},"id":"mhym_lTkXq8H","outputId":"a00c4214-1e45-41a1-e900-35dae8d0eb27"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" hostname asset_ip_address feed_log_type ioc_type \\\n","0 None 10.166.0.3 ET_PRO_IOC IOC_TYPE_IP \n","1 None 10.164.0.27 ET_PRO_IOC IOC_TYPE_IP \n","2 win-dc-01 None ET_PRO_IOC IOC_TYPE_IP \n","3 None 10.164.0.3 ET_PRO_IOC IOC_TYPE_IP \n","4 None 10.166.0.5 ET_PRO_IOC IOC_TYPE_IP \n","5 lin-jb-02 None ET_PRO_IOC IOC_TYPE_IP \n","6 win-dc-02 None ET_PRO_IOC IOC_TYPE_IP \n","7 None 10.164.0.26 ET_PRO_IOC IOC_TYPE_IP \n","8 lin-mgmt-01 None ET_PRO_IOC IOC_TYPE_IP \n","9 win-dc-01 None ET_PRO_IOC IOC_TYPE_DOMAIN \n","\n"," category count \n","0 Observed serving executables 16504 \n","1 Observed serving executables 13501 \n","2 Observed serving executables 12997 \n","3 Observed serving executables 12555 \n","4 Observed serving executables 12119 \n","5 Observed serving executables 8773 \n","6 Observed serving executables 8045 \n","7 Target of a DDoS 7755 \n","8 Observed serving executables 6642 \n","9 Malware Command and Control Server 5911 "],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
hostnameasset_ip_addressfeed_log_typeioc_typecategorycount
0None10.166.0.3ET_PRO_IOCIOC_TYPE_IPObserved serving executables16504
1None10.164.0.27ET_PRO_IOCIOC_TYPE_IPObserved serving executables13501
2win-dc-01NoneET_PRO_IOCIOC_TYPE_IPObserved serving executables12997
3None10.164.0.3ET_PRO_IOCIOC_TYPE_IPObserved serving executables12555
4None10.166.0.5ET_PRO_IOCIOC_TYPE_IPObserved serving executables12119
5lin-jb-02NoneET_PRO_IOCIOC_TYPE_IPObserved serving executables8773
6win-dc-02NoneET_PRO_IOCIOC_TYPE_IPObserved serving executables8045
7None10.164.0.26ET_PRO_IOCIOC_TYPE_IPTarget of a DDoS7755
8lin-mgmt-01NoneET_PRO_IOCIOC_TYPE_IPObserved serving executables6642
9win-dc-01NoneET_PRO_IOCIOC_TYPE_DOMAINMalware Command and Control Server5911
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":51}]},{"cell_type":"markdown","source":["Hoever, say you wish to graph these values and required a single field rather than Hostname or IP address? This is where COALESCE can help by being able to specify several fields, in order, and use the first non-null result.\n"],"metadata":{"id":"MqtmrcBFV2Iq"}},{"cell_type":"code","source":["sql_coalesce_example_2=\"\"\"\n","SELECT\n"," COALESCE(asset.hostname,\n"," asset.asset_ip_address) AS asset,\n"," feed_log_type,\n"," ioc_type,\n"," category,\n"," COUNT(1) AS count\n","FROM\n"," `chronicle-coe.datalake.ioc_matches`\n","GROUP BY\n"," 1,\n"," 2,\n"," 3,\n"," 4\n","ORDER BY\n"," count DESC\n","LIMIT\n"," 10\n","\"\"\""],"metadata":{"id":"DrVMWFpvV2QF"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_coalesce_example_2)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":363},"id":"X8noC5xOY5LS","outputId":"a8dab6b5-fb38-43ff-dd0b-5250b84d7973"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" asset feed_log_type ioc_type \\\n","0 10.166.0.3 ET_PRO_IOC IOC_TYPE_IP \n","1 10.164.0.27 ET_PRO_IOC IOC_TYPE_IP \n","2 win-dc-01 ET_PRO_IOC IOC_TYPE_IP \n","3 10.164.0.3 ET_PRO_IOC IOC_TYPE_IP \n","4 10.166.0.5 ET_PRO_IOC IOC_TYPE_IP \n","5 lin-jb-02 ET_PRO_IOC IOC_TYPE_IP \n","6 win-dc-02 ET_PRO_IOC IOC_TYPE_IP \n","7 10.164.0.26 ET_PRO_IOC IOC_TYPE_IP \n","8 lin-mgmt-01 ET_PRO_IOC IOC_TYPE_IP \n","9 win-dc-01 ET_PRO_IOC IOC_TYPE_DOMAIN \n","\n"," category count \n","0 Observed serving executables 16504 \n","1 Observed serving executables 13501 \n","2 Observed serving executables 12997 \n","3 Observed serving executables 12555 \n","4 Observed serving executables 12119 \n","5 Observed serving executables 8773 \n","6 Observed serving executables 8045 \n","7 Target of a DDoS 7755 \n","8 Observed serving executables 6642 \n","9 Malware Command and Control Server 5911 "],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
assetfeed_log_typeioc_typecategorycount
010.166.0.3ET_PRO_IOCIOC_TYPE_IPObserved serving executables16504
110.164.0.27ET_PRO_IOCIOC_TYPE_IPObserved serving executables13501
2win-dc-01ET_PRO_IOCIOC_TYPE_IPObserved serving executables12997
310.164.0.3ET_PRO_IOCIOC_TYPE_IPObserved serving executables12555
410.166.0.5ET_PRO_IOCIOC_TYPE_IPObserved serving executables12119
5lin-jb-02ET_PRO_IOCIOC_TYPE_IPObserved serving executables8773
6win-dc-02ET_PRO_IOCIOC_TYPE_IPObserved serving executables8045
710.164.0.26ET_PRO_IOCIOC_TYPE_IPTarget of a DDoS7755
8lin-mgmt-01ET_PRO_IOCIOC_TYPE_IPObserved serving executables6642
9win-dc-01ET_PRO_IOCIOC_TYPE_DOMAINMalware Command and Control Server5911
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":49}]},{"cell_type":"markdown","source":["# Entity Graph & Entity Graph Enum\n","\n","The `entity_enum_value_to_name_mapping` table is a lookup table used to map integer (normalized) values from the `entity_graph` table. The below query is just for informational purposes to explore the dataset. Later examples in the Notebook will show practical examples of how to join this dataset against the `entity_graph` table."],"metadata":{"id":"xlwYTWfK8oSW"}},{"cell_type":"code","source":["sql_list_field_path = \"\"\"\n","SELECT\n"," COUNT(field_path) AS count,\n"," field_path\n","FROM\n"," `datalake.entity_enum_value_to_name_mapping`\n","GROUP BY\n"," 2\n","ORDER BY\n"," 1 DESC\n","\"\"\""],"metadata":{"id":"FykxBE2r_6h8"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_list_field_path)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":646},"id":"zBq0kzop_waf","outputId":"e599489e-b978-4cc1-847f-49e0322f478f"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
countfield_path
020backstory.Resource.ResourceType
120backstory.SecurityResult.SecurityCategory
210backstory.Asset.AssetType
39backstory.EntityMetadata.EntityType
47backstory.Noun.Platform
57backstory.SecurityResult.ProductSeverity
66backstory.Id.Namespace
76backstory.SecurityResult.Action
85backstory.Vulnerability.Severity
95backstory.Permission.PermissionType
105backstory.Authentication.AuthenticationStatus
114backstory.Status
124backstory.Relation.Relationship
134backstory.Asset.DeploymentStatus
144backstory.Cloud.CloudEnvironment
154backstory.SecurityResult.ThreatStatus
164backstory.SecurityResult.ProductPriority
174backstory.SecurityResult.ProductConfidence
183backstory.Verdict
193backstory.Role.Type
203backstory.User.Role
213backstory.Reputation
223backstory.Relation.Directionality
233backstory.SecurityResult.AlertState
241google.protobuf.NullValue
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "],"text/plain":[" count field_path\n","0 20 backstory.Resource.ResourceType\n","1 20 backstory.SecurityResult.SecurityCategory\n","2 10 backstory.Asset.AssetType\n","3 9 backstory.EntityMetadata.EntityType\n","4 7 backstory.Noun.Platform\n","5 7 backstory.SecurityResult.ProductSeverity\n","6 6 backstory.Id.Namespace\n","7 6 backstory.SecurityResult.Action\n","8 5 backstory.Vulnerability.Severity\n","9 5 backstory.Permission.PermissionType\n","10 5 backstory.Authentication.AuthenticationStatus\n","11 4 backstory.Status\n","12 4 backstory.Relation.Relationship\n","13 4 backstory.Asset.DeploymentStatus\n","14 4 backstory.Cloud.CloudEnvironment\n","15 4 backstory.SecurityResult.ThreatStatus\n","16 4 backstory.SecurityResult.ProductPriority\n","17 4 backstory.SecurityResult.ProductConfidence\n","18 3 backstory.Verdict\n","19 3 backstory.Role.Type\n","20 3 backstory.User.Role\n","21 3 backstory.Reputation\n","22 3 backstory.Relation.Directionality\n","23 3 backstory.SecurityResult.AlertState\n","24 1 google.protobuf.NullValue"]},"metadata":{},"execution_count":10}]},{"cell_type":"markdown","source":["To query UDM Entity Event log sources in the last 7 days run the below query. "],"metadata":{"id":"rW83gXk8l0Yo"}},{"cell_type":"code","source":["sql_entity_graph_types_summary = \"\"\"\n","-- Add excluded vendor here\n","DECLARE __EXCLUDED_VENDOR_NAME__ ARRAY ;\n","SET __EXCLUDED_VENDOR_NAME__ = ['ET_PRO_IOC'];\n","\n","SELECT\n"," COUNT(entity.metadata.entity_type) AS count,\n"," entity.metadata.vendor_name,\n"," entity.metadata.product_name,\n"," mapping.enum_name\n","FROM\n"," `datalake.entity_graph` entity\n","JOIN\n"," `datalake.entity_enum_value_to_name_mapping` mapping\n","ON\n"," entity.metadata.entity_type = mapping.enum_value\n","WHERE\n"," DATE(_PARTITIONTIME) > DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY)\n"," AND mapping.field_path = \"backstory.EntityMetadata.EntityType\"\n"," AND entity.metadata.vendor_name NOT IN UNNEST (__EXCLUDED_VENDOR_NAME__) \n","GROUP BY\n"," 2,\n"," 3,\n"," 4\n","ORDER BY\n"," 1 DESC\n","\"\"\""],"metadata":{"id":"B3zP-gLh8qkF"},"execution_count":null,"outputs":[]},{"cell_type":"markdown","source":["The query makes use of the following:\n","
    \n","
  • a default exclude filter declaring an array of string values
  • \n","
  • joins the `entity_graph` table against the `entity_enum_value_to_name_mapping` to return normalized `entity_type`
  • \n","
  • uses a `_PARTITIONTIME` filter to only query the last 7 days
  • \n","
"],"metadata":{"id":"gdRmU2Vyn4OU"}},{"cell_type":"code","source":["run_query(sql_entity_graph_types_summary)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":280},"id":"WYS6-qke9RzR","outputId":"fbd889f6-045c-47aa-80bf-fcf3d086d622"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
countvendor_nameproduct_nameenum_name
0204Google CloudGoogle Cloud BigQueryRESOURCE
1100Google Compute EngineGCP_COMPUTE_CONTEXTASSET
291GoogleCloud IdentityUSER
386Google Cloud IAMGoogle Cloud IAM ANALYSISUSER
448MicrosoftWindows Active DirectoryUSER
536GoogleAccess Context ManagerUSER
624Google Compute EngineGCP_STORAGE_CONTEXTRESOURCE
724Google CloudGCP_STORAGE_CONTEXTRESOURCE
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "],"text/plain":[" count vendor_name product_name enum_name\n","0 204 Google Cloud Google Cloud BigQuery RESOURCE\n","1 100 Google Compute Engine GCP_COMPUTE_CONTEXT ASSET\n","2 91 Google Cloud Identity USER\n","3 86 Google Cloud IAM Google Cloud IAM ANALYSIS USER\n","4 48 Microsoft Windows Active Directory USER\n","5 36 Google Access Context Manager USER\n","6 24 Google Compute Engine GCP_STORAGE_CONTEXT RESOURCE\n","7 24 Google Cloud GCP_STORAGE_CONTEXT RESOURCE"]},"metadata":{},"execution_count":8}]},{"cell_type":"markdown","source":["## Query Entity Graph for a specified user\n","\n","If you require picture of a specific user's context entity data you can run a query like below, which joins together users several of the BigQuery SQL features covered in this Notebook."],"metadata":{"id":"8N2R3Ugjf18H"}},{"cell_type":"code","source":["query_user_context_for_specific_user = \"\"\"\n","DECLARE\n"," __USER_ID__ STRING DEFAULT \"(?i)admin\"; --(?i) makes the regex case insensitive\n","DECLARE \n"," __USER_EMAIL__ STRING DEFAULT \"(?i)admin\";\n","DECLARE\n"," __PARTITION_START_DATE__ STRING DEFAULT \"2022-04-18\"; --query is equal to or greater than\n"," ---------------\n","SELECT\n"," m.enum_name AS context_entity_type,\n"," e.metadata.vendor_name AS context_vendor,\n"," e.metadata.product_name AS context_product,\n"," TIMESTAMP_SECONDS(e.metadata.collected_timestamp.seconds) AS collected_timestamp,\n"," TIMESTAMP_SECONDS(e.metadata.interval.start_time.seconds) AS interval_start_timestamp,\n"," TIMESTAMP_SECONDS(e.metadata.interval.end_time.seconds) AS interval_end_timestamp,\n"," DATETIME_DIFF(DATETIME(TIMESTAMP_SECONDS(e.metadata.INTERVAL.end_time.seconds)), DATETIME(TIMESTAMP_SECONDS(e.metadata.INTERVAL.start_time.seconds)), HOUR) AS interval_duration,\n"," e.entity.user.first_name AS user_first_name,\n"," e.entity.user.last_name AS user_last_name,\n"," e.entity.user.userid AS user_id,\n"," email AS user_email,\n"," e.metadata.product_entity_id, \n"," relation.entity.asset.hostname AS user_asset_hostname,\n"," relation.entity.asset.asset_id AS user_asset_id,\n"," relation.entity.asset.ip AS user_asset_ip, \n","FROM\n"," `datalake.entity_graph` e\n","LEFT OUTER JOIN --not always populated\n"," UNNEST(e.relations) relation\n","CROSS JOIN\n"," UNNEST(e.entity.user.email_addresses) email\n","JOIN\n"," `datalake.entity_enum_value_to_name_mapping` m\n","ON\n"," e.metadata.entity_type = m.enum_value\n","WHERE\n"," DATE(_PARTITIONTIME) >= CAST(__PARTITION_START_DATE__ AS DATE)\n"," AND ( REGEXP_CONTAINS(email, __USER_EMAIL__)\n"," OR REGEXP_CONTAINS(e.entity.user.userid, __USER_ID__) )\n"," AND m.field_path = \"backstory.EntityMetadata.EntityType\"\n","\"\"\""],"metadata":{"id":"a3sgqpivf2Kp"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(query_user_context_for_specific_user)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":830},"id":"4Np1koorHoWM","outputId":"206ea147-fe93-4e84-ffe5-053d96c450cc"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stderr","text":["/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/table.py:1582: FutureWarning: Data is timezone-aware. Converting timezone-aware data to timezone-naive by passing dtype='datetime64[ns]' to DataFrame or Series is deprecated and will raise in a future version. Use `pd.Series(values).dt.tz_localize(None)` instead.\n"," bqstorage_client=bqstorage_client, dtypes=dtypes\n"]},{"output_type":"execute_result","data":{"text/plain":[" context_entity_type context_vendor context_product \\\n","0 USER Google Cloud Platform GCP IAM ANALYSIS \n","1 USER Google Cloud Identity \n","2 USER Google Cloud Identity \n","3 USER Google Cloud Identity \n","4 USER Google Cloud Platform GCP IAM ANALYSIS \n","5 USER Google Cloud Platform GCP IAM ANALYSIS \n","6 USER Google Cloud Platform GCP IAM ANALYSIS \n","7 USER Google Cloud Identity \n","8 USER Google Cloud Identity \n","9 USER Google Cloud Identity \n","10 USER Google Cloud Platform GCP IAM ANALYSIS \n","11 USER Google Cloud Identity \n","12 USER Google Cloud Identity \n","13 USER Google Cloud Identity \n","\n"," collected_timestamp interval_start_timestamp \\\n","0 2022-04-18 03:34:52+00:00 2022-04-20 00:00:00+00:00 \n","1 2022-04-13 12:49:21+00:00 2022-04-20 00:00:00+00:00 \n","2 2022-04-13 12:49:21+00:00 2022-04-20 00:00:00+00:00 \n","3 2022-04-13 12:49:21+00:00 2022-04-20 00:00:00+00:00 \n","4 2022-04-17 03:51:42+00:00 2022-04-18 00:00:00+00:00 \n","5 2022-04-18 03:34:36+00:00 2022-04-18 03:34:36+00:00 \n","6 2022-04-18 03:34:52+00:00 2022-04-18 03:34:52+00:00 \n","7 2022-04-13 12:49:21+00:00 2022-04-18 00:00:00+00:00 \n","8 2022-04-13 12:49:21+00:00 2022-04-18 00:00:00+00:00 \n","9 2022-04-13 12:49:21+00:00 2022-04-18 00:00:00+00:00 \n","10 2022-04-18 03:34:52+00:00 2022-04-19 00:00:00+00:00 \n","11 2022-04-13 12:49:21+00:00 2022-04-19 00:00:00+00:00 \n","12 2022-04-13 12:49:21+00:00 2022-04-19 00:00:00+00:00 \n","13 2022-04-13 12:49:21+00:00 2022-04-19 00:00:00+00:00 \n","\n"," interval_end_timestamp interval_duration user_first_name \\\n","0 2022-04-21 00:00:00+00:00 24 None \n","1 2022-04-21 00:00:00+00:00 24 Super \n","2 2022-04-21 00:00:00+00:00 24 Super \n","3 2022-04-21 00:00:00+00:00 24 Super \n","4 2022-04-18 03:34:36+00:00 3 None \n","5 2022-04-18 03:34:52+00:00 0 None \n","6 2022-04-19 00:00:00+00:00 21 None \n","7 2022-04-19 00:00:00+00:00 24 Super \n","8 2022-04-19 00:00:00+00:00 24 Super \n","9 2022-04-19 00:00:00+00:00 24 Super \n","10 2022-04-20 00:00:00+00:00 24 None \n","11 2022-04-20 00:00:00+00:00 24 Super \n","12 2022-04-20 00:00:00+00:00 24 Super \n","13 2022-04-20 00:00:00+00:00 24 Super \n","\n"," user_last_name user_id \\\n","0 None 102520959283965001184 \n","1 User admin_1823127835827_altostrat_co \n","2 User admin_1823127835827_altostrat_co \n","3 User admin_1823127835827_altostrat_co \n","4 None 102520959283965001184 \n","5 None 102520959283965001184 \n","6 None 102520959283965001184 \n","7 User admin_1823127835827_altostrat_co \n","8 User admin_1823127835827_altostrat_co \n","9 User admin_1823127835827_altostrat_co \n","10 None 102520959283965001184 \n","11 User admin_1823127835827_altostrat_co \n","12 User admin_1823127835827_altostrat_co \n","13 User admin_1823127835827_altostrat_co \n","\n"," user_email product_entity_id \\\n","0 admin@1823127835827.altostrat.com None \n","1 admin@1823127835827.altostrat.com None \n","2 admin@1823127835827.demo.altostrat.com None \n","3 admin@1823127835827.altostrat.com.test-google-... None \n","4 admin@1823127835827.altostrat.com None \n","5 admin@1823127835827.altostrat.com None \n","6 admin@1823127835827.altostrat.com None \n","7 admin@1823127835827.altostrat.com None \n","8 admin@1823127835827.demo.altostrat.com None \n","9 admin@1823127835827.altostrat.com.test-google-... None \n","10 admin@1823127835827.altostrat.com None \n","11 admin@1823127835827.altostrat.com None \n","12 admin@1823127835827.demo.altostrat.com None \n","13 admin@1823127835827.altostrat.com.test-google-... None \n","\n"," user_asset_hostname user_asset_id user_asset_ip \n","0 None None [] \n","1 None None [] \n","2 None None [] \n","3 None None [] \n","4 None None [] \n","5 None None [] \n","6 None None [] \n","7 None None [] \n","8 None None [] \n","9 None None [] \n","10 None None [] \n","11 None None [] \n","12 None None [] \n","13 None None [] "],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
context_entity_typecontext_vendorcontext_productcollected_timestampinterval_start_timestampinterval_end_timestampinterval_durationuser_first_nameuser_last_nameuser_iduser_emailproduct_entity_iduser_asset_hostnameuser_asset_iduser_asset_ip
0USERGoogle Cloud PlatformGCP IAM ANALYSIS2022-04-18 03:34:52+00:002022-04-20 00:00:00+00:002022-04-21 00:00:00+00:0024NoneNone102520959283965001184admin@1823127835827.altostrat.comNoneNoneNone[]
1USERGoogleCloud Identity2022-04-13 12:49:21+00:002022-04-20 00:00:00+00:002022-04-21 00:00:00+00:0024SuperUseradmin_1823127835827_altostrat_coadmin@1823127835827.altostrat.comNoneNoneNone[]
2USERGoogleCloud Identity2022-04-13 12:49:21+00:002022-04-20 00:00:00+00:002022-04-21 00:00:00+00:0024SuperUseradmin_1823127835827_altostrat_coadmin@1823127835827.demo.altostrat.comNoneNoneNone[]
3USERGoogleCloud Identity2022-04-13 12:49:21+00:002022-04-20 00:00:00+00:002022-04-21 00:00:00+00:0024SuperUseradmin_1823127835827_altostrat_coadmin@1823127835827.altostrat.com.test-google-...NoneNoneNone[]
4USERGoogle Cloud PlatformGCP IAM ANALYSIS2022-04-17 03:51:42+00:002022-04-18 00:00:00+00:002022-04-18 03:34:36+00:003NoneNone102520959283965001184admin@1823127835827.altostrat.comNoneNoneNone[]
5USERGoogle Cloud PlatformGCP IAM ANALYSIS2022-04-18 03:34:36+00:002022-04-18 03:34:36+00:002022-04-18 03:34:52+00:000NoneNone102520959283965001184admin@1823127835827.altostrat.comNoneNoneNone[]
6USERGoogle Cloud PlatformGCP IAM ANALYSIS2022-04-18 03:34:52+00:002022-04-18 03:34:52+00:002022-04-19 00:00:00+00:0021NoneNone102520959283965001184admin@1823127835827.altostrat.comNoneNoneNone[]
7USERGoogleCloud Identity2022-04-13 12:49:21+00:002022-04-18 00:00:00+00:002022-04-19 00:00:00+00:0024SuperUseradmin_1823127835827_altostrat_coadmin@1823127835827.altostrat.comNoneNoneNone[]
8USERGoogleCloud Identity2022-04-13 12:49:21+00:002022-04-18 00:00:00+00:002022-04-19 00:00:00+00:0024SuperUseradmin_1823127835827_altostrat_coadmin@1823127835827.demo.altostrat.comNoneNoneNone[]
9USERGoogleCloud Identity2022-04-13 12:49:21+00:002022-04-18 00:00:00+00:002022-04-19 00:00:00+00:0024SuperUseradmin_1823127835827_altostrat_coadmin@1823127835827.altostrat.com.test-google-...NoneNoneNone[]
10USERGoogle Cloud PlatformGCP IAM ANALYSIS2022-04-18 03:34:52+00:002022-04-19 00:00:00+00:002022-04-20 00:00:00+00:0024NoneNone102520959283965001184admin@1823127835827.altostrat.comNoneNoneNone[]
11USERGoogleCloud Identity2022-04-13 12:49:21+00:002022-04-19 00:00:00+00:002022-04-20 00:00:00+00:0024SuperUseradmin_1823127835827_altostrat_coadmin@1823127835827.altostrat.comNoneNoneNone[]
12USERGoogleCloud Identity2022-04-13 12:49:21+00:002022-04-19 00:00:00+00:002022-04-20 00:00:00+00:0024SuperUseradmin_1823127835827_altostrat_coadmin@1823127835827.demo.altostrat.comNoneNoneNone[]
13USERGoogleCloud Identity2022-04-13 12:49:21+00:002022-04-19 00:00:00+00:002022-04-20 00:00:00+00:0024SuperUseradmin_1823127835827_altostrat_coadmin@1823127835827.altostrat.com.test-google-...NoneNoneNone[]
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":43}]},{"cell_type":"markdown","source":["---"],"metadata":{"id":"hmt9i12x8m85"}},{"cell_type":"markdown","source":["# UDM Events\n","\n","The following section has some example UDM Events queries for example and reference."],"metadata":{"id":"3EMuDKwS-aIC"}},{"cell_type":"markdown","source":["## Network DNS\n","\n"],"metadata":{"id":"yLbg9j-c-cbU"}},{"cell_type":"markdown","source":["### Large DNS \n","\n","Find large DNS query that could represent suspicious activity."],"metadata":{"id":"B1ZLJ0f45mAv"}},{"cell_type":"code","source":["sql_dns_answers_greater_than_50_chars = \"\"\"\n","SELECT\n"," COUNT(metadata.event_timestamp.seconds) AS count,\n"," TIMESTAMP_SECONDS(MIN(metadata.event_timestamp.seconds)) AS first_observed,\n"," TIMESTAMP_SECONDS(MAX(metadata.event_timestamp.seconds)) AS last_observed,\n"," sip,\n"," dnsAnswers.name\n","FROM\n"," `datalake.udm_events`,\n"," UNNEST(network.dns.answers) AS dnsAnswers,\n"," UNNEST(principal.ip) AS sip\n","WHERE\n"," DATE(_PARTITIONTIME) > DATE_SUB(CURRENT_DATE, INTERVAL 7 DAY)\n"," AND LENGTH(dnsAnswers.name) > 50\n"," AND NOT REGEXP_CONTAINS(dnsAnswers.name, r'internal\\.$') \n","GROUP BY\n"," 4,\n"," 5\n","ORDER BY\n"," 1 DESC\n","LIMIT 10\n","\"\"\""],"metadata":{"id":"fYU8gpPEsiq0"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_dns_answers_greater_than_50_chars)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":376},"id":"gg_Wdjnvxp1Q","outputId":"cb1c72f2-b28a-4579-9c01-9746c21f477a"},"execution_count":null,"outputs":[{"output_type":"stream","name":"stderr","text":["/usr/local/lib/python3.7/dist-packages/google/cloud/bigquery/table.py:1582: FutureWarning: Data is timezone-aware. Converting timezone-aware data to timezone-naive by passing dtype='datetime64[ns]' to DataFrame or Series is deprecated and will raise in a future version. Use `pd.Series(values).dt.tz_localize(None)` instead.\n"," bqstorage_client=bqstorage_client, dtypes=dtypes\n"]},{"output_type":"execute_result","data":{"text/plain":[" count first_observed last_observed sip \\\n","0 192 2022-03-24 00:35:12+00:00 2022-03-30 14:34:31+00:00 10.166.0.3 \n","1 191 2022-03-24 03:28:53+00:00 2022-03-30 10:22:13+00:00 10.166.0.3 \n","2 102 2022-03-24 03:38:11+00:00 2022-03-30 13:41:38+00:00 10.166.0.3 \n","3 102 2022-03-24 00:38:10+00:00 2022-03-30 12:39:26+00:00 10.166.0.3 \n","4 98 2022-03-24 02:38:11+00:00 2022-03-30 14:41:39+00:00 10.166.0.3 \n","5 88 2022-03-24 00:38:10+00:00 2022-03-30 09:41:36+00:00 10.166.0.3 \n","6 49 2022-03-24 01:39:36+00:00 2022-03-30 11:20:14+00:00 10.166.0.3 \n","7 22 2022-03-24 02:07:08+00:00 2022-03-30 10:17:56+00:00 10.166.0.3 \n","8 9 2022-03-24 10:08:40+00:00 2022-03-30 10:09:21+00:00 10.166.0.3 \n","9 6 2022-03-24 04:29:47+00:00 2022-03-29 09:31:04+00:00 10.166.0.3 \n","\n"," name \n","0 settings-prod-neu-1.northeurope.cloudapp.azure... \n","1 settings-prod-neu-2.northeurope.cloudapp.azure... \n","2 wd-prod-ss-eu-north-2-fe.northeurope.cloudapp.... \n","3 wd-prod-ss-eu-north-1-fe.northeurope.cloudapp.... \n","4 wd-prod-ss-eu-west-2-fe.westeurope.cloudapp.az... \n","5 wd-prod-ss-eu-west-1-fe.westeurope.cloudapp.az... \n","6 onedscolprdaus02.australiasoutheast.cloudapp.a... \n","7 onedscolprdaus00.australiasoutheast.cloudapp.a... \n","8 storecatalogrevocation.storequality.microsoft.... \n","9 wd-prod-cp-eu-north-1-fe.northeurope.cloudapp.... "],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
countfirst_observedlast_observedsipname
01922022-03-24 00:35:12+00:002022-03-30 14:34:31+00:0010.166.0.3settings-prod-neu-1.northeurope.cloudapp.azure...
11912022-03-24 03:28:53+00:002022-03-30 10:22:13+00:0010.166.0.3settings-prod-neu-2.northeurope.cloudapp.azure...
21022022-03-24 03:38:11+00:002022-03-30 13:41:38+00:0010.166.0.3wd-prod-ss-eu-north-2-fe.northeurope.cloudapp....
31022022-03-24 00:38:10+00:002022-03-30 12:39:26+00:0010.166.0.3wd-prod-ss-eu-north-1-fe.northeurope.cloudapp....
4982022-03-24 02:38:11+00:002022-03-30 14:41:39+00:0010.166.0.3wd-prod-ss-eu-west-2-fe.westeurope.cloudapp.az...
5882022-03-24 00:38:10+00:002022-03-30 09:41:36+00:0010.166.0.3wd-prod-ss-eu-west-1-fe.westeurope.cloudapp.az...
6492022-03-24 01:39:36+00:002022-03-30 11:20:14+00:0010.166.0.3onedscolprdaus02.australiasoutheast.cloudapp.a...
7222022-03-24 02:07:08+00:002022-03-30 10:17:56+00:0010.166.0.3onedscolprdaus00.australiasoutheast.cloudapp.a...
892022-03-24 10:08:40+00:002022-03-30 10:09:21+00:0010.166.0.3storecatalogrevocation.storequality.microsoft....
962022-03-24 04:29:47+00:002022-03-29 09:31:04+00:0010.166.0.3wd-prod-cp-eu-north-1-fe.northeurope.cloudapp....
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":23}]},{"cell_type":"markdown","source":["### Unauthorized DNS Servers\n","\n","Use a String Array to find unauthorized DNS servers."],"metadata":{"id":"blUViSK8Bf65"}},{"cell_type":"code","source":["sql_dns_unauthorized_servers=\"\"\"\n","-- Add your DNS servers here\n","DECLARE __EXCLUDED_DNS_ADDRESSES__ ARRAY ;\n","SET __EXCLUDED_DNS_ADDRESSES__ = ['127.0.0.1','169.254.169.254','0:0:0:0:0:0:0:1'];\n","------\n","SELECT \n"," COUNT(target.ip) AS count,\n"," dip\n","FROM `datalake.udm_events`,\n","UNNEST(target.ip) as dip\n","\n","--include filters\n","WHERE target.port = 53\n","\n","--exclude filters\n","AND dip NOT IN UNNEST (__EXCLUDED_DNS_ADDRESSES__)\n","\n","GROUP BY 2\n","ORDER BY 1 DESC\n","\"\"\""],"metadata":{"id":"Nc5oLgm5BgCy"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_dns_unauthorized_servers)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":237},"id":"3QFZDGoNB26x","outputId":"198295d8-8ee5-4219-949c-ef7b3ee38c90"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
countdip
011229210.99.1.2
111144610.99.0.194
29723310.105.128.66
39665810.105.128.6
41410210.99.0.67
51384810.99.0.6
\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "],"text/plain":[" count dip\n","0 112292 10.99.1.2\n","1 111446 10.99.0.194\n","2 97233 10.105.128.66\n","3 96658 10.105.128.6\n","4 14102 10.99.0.67\n","5 13848 10.99.0.6"]},"metadata":{},"execution_count":45}]},{"cell_type":"markdown","source":["## Process Launch"],"metadata":{"id":"HAA1LZtTZq_y"}},{"cell_type":"markdown","source":[""],"metadata":{"id":"R5_uUjmkQQRb"}},{"cell_type":"markdown","source":["## User Login"],"metadata":{"id":"EJh0XC23TziT"}},{"cell_type":"markdown","source":["### User Login count, per day, with User Coalesce\n","\n","The below query generates a daily count of User Login events querying both principal and target userId and Email address fields."],"metadata":{"id":"itMCx5B7PQZK"}},{"cell_type":"code","source":["sql_user_aggregates = \"\"\"\n","DECLARE\n"," __USER_ID__ STRING DEFAULT \"(?i)admin\"; --(?i) makes the regex case insensitive\n","SELECT \n","COUNT(*) AS total,\n","m.enum_name AS event_type,\n","DATE_TRUNC(DATE(TIMESTAMP_SECONDS(e.metadata.event_timestamp.seconds)), DAY) AS day,\n","metadata.vendor_name,\n","metadata.product_name,\n","metadata.product_event_type,\n","COALESCE(LOWER(principal.user.userid),LOWER(principal_email)) AS principal_user,\n","COALESCE(LOWER(principal.hostname),principal_ip,LOWER(src.hostname),src_ip) AS principal_host,\n","COALESCE(LOWER(target.user.userid),LOWER(target_email)) AS target_user,\n","COALESCE(LOWER(target.hostname),target_ip) AS target_host,\n","FROM `datalake.udm_events` e\n","LEFT OUTER JOIN\n"," UNNEST(principal.ip) principal_ip\n","LEFT OUTER JOIN\n"," UNNEST(target.ip) target_ip \n","LEFT OUTER JOIN\n"," UNNEST(src.ip) src_ip \n","LEFT OUTER JOIN\n"," UNNEST(e.principal.user.email_addresses) principal_email\n","LEFT OUTER JOIN\n"," UNNEST(e.target.user.email_addresses) target_email \n","JOIN\n"," `datalake.udm_enum_value_to_name_mapping` m\n","ON\n"," e.metadata.event_type = m.enum_value\n","WHERE DATE(_PARTITIONTIME) >= \"2022-03-01\" \n","AND (\n"," REGEXP_CONTAINS(principal.user.userid, __USER_ID__) \n"," OR REGEXP_CONTAINS(target.user.userid, __USER_ID__) \n"," OR REGEXP_CONTAINS(principal_email, __USER_ID__) \n"," OR REGEXP_CONTAINS(target_email, __USER_ID__) \n"," )\n","AND m.enum_name = \"USER_LOGIN\"\n","AND m.field_path = \"backstory.Metadata.EventType\"\n","GROUP BY 2,3,4,5,6,7,8,9,10\n","ORDER BY day ASC\n","LIMIT 10\n","\"\"\""],"metadata":{"id":"ME1HFP8CQQXZ"},"execution_count":null,"outputs":[]},{"cell_type":"code","source":["run_query(sql_user_aggregates)"],"metadata":{"colab":{"base_uri":"https://localhost:8080/","height":641},"id":"OOkX41e6NnEi","outputId":"438058f4-b22a-4035-92bc-1db9bdf7d991"},"execution_count":null,"outputs":[{"output_type":"execute_result","data":{"text/plain":[" total event_type day vendor_name \\\n","0 2 USER_LOGIN 2022-03-12 Microsoft \n","1 2 USER_LOGIN 2022-03-12 Microsoft \n","2 2 USER_LOGIN 2022-03-12 Microsoft \n","3 2 USER_LOGIN 2022-03-13 Microsoft \n","4 2 USER_LOGIN 2022-03-13 Microsoft \n",".. ... ... ... ... \n","179 14 USER_LOGIN 2022-04-19 Google Workspace \n","180 2 USER_LOGIN 2022-04-20 Microsoft \n","181 2 USER_LOGIN 2022-04-20 Google Workspace \n","182 2 USER_LOGIN 2022-04-20 Microsoft \n","183 2 USER_LOGIN 2022-04-20 Microsoft \n","\n"," product_name product_event_type principal_user \\\n","0 Microsoft-Windows-Security-Auditing 4624 win-dc-01$ \n","1 Microsoft-Windows-Security-Auditing 4648 win-dc-01$ \n","2 Microsoft-Windows-Security-Auditing 4672 None \n","3 Microsoft-Windows-Security-Auditing 4648 win-dc-01$ \n","4 Microsoft-Windows-Security-Auditing 4672 None \n",".. ... ... ... \n","179 login login_success None \n","180 Microsoft-Windows-Security-Auditing 4624 win-dc-01$ \n","181 saml login_success None \n","182 Microsoft-Windows-Security-Auditing 4672 None \n","183 Microsoft-Windows-Security-Auditing 4648 win-dc-01$ \n","\n"," principal_host target_user target_host \n","0 win-dc-01 admin None \n","1 win-dc-01 admin localhost \n","2 win-dc-01.ad.1823127835827.altostrat.com admin None \n","3 win-dc-01 admin localhost \n","4 win-dc-01.ad.1823127835827.altostrat.com admin None \n",".. ... ... ... \n","179 62.163.105.47 c03siqfu3 None \n","180 win-dc-01 admin None \n","181 62.163.105.47 c03siqfu3 None \n","182 win-dc-01.ad.1823127835827.altostrat.com admin None \n","183 win-dc-01 admin localhost \n","\n","[184 rows x 10 columns]"],"text/html":["\n","
\n","
\n","
\n","\n","\n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n"," \n","
totalevent_typedayvendor_nameproduct_nameproduct_event_typeprincipal_userprincipal_hosttarget_usertarget_host
02USER_LOGIN2022-03-12MicrosoftMicrosoft-Windows-Security-Auditing4624win-dc-01$win-dc-01adminNone
12USER_LOGIN2022-03-12MicrosoftMicrosoft-Windows-Security-Auditing4648win-dc-01$win-dc-01adminlocalhost
22USER_LOGIN2022-03-12MicrosoftMicrosoft-Windows-Security-Auditing4672Nonewin-dc-01.ad.1823127835827.altostrat.comadminNone
32USER_LOGIN2022-03-13MicrosoftMicrosoft-Windows-Security-Auditing4648win-dc-01$win-dc-01adminlocalhost
42USER_LOGIN2022-03-13MicrosoftMicrosoft-Windows-Security-Auditing4672Nonewin-dc-01.ad.1823127835827.altostrat.comadminNone
.................................
17914USER_LOGIN2022-04-19Google Workspaceloginlogin_successNone62.163.105.47c03siqfu3None
1802USER_LOGIN2022-04-20MicrosoftMicrosoft-Windows-Security-Auditing4624win-dc-01$win-dc-01adminNone
1812USER_LOGIN2022-04-20Google Workspacesamllogin_successNone62.163.105.47c03siqfu3None
1822USER_LOGIN2022-04-20MicrosoftMicrosoft-Windows-Security-Auditing4672Nonewin-dc-01.ad.1823127835827.altostrat.comadminNone
1832USER_LOGIN2022-04-20MicrosoftMicrosoft-Windows-Security-Auditing4648win-dc-01$win-dc-01adminlocalhost
\n","

184 rows × 10 columns

\n","
\n"," \n"," \n"," \n","\n"," \n","
\n","
\n"," "]},"metadata":{},"execution_count":49}]},{"cell_type":"markdown","source":["# Footnote & Erata\n","\n","

[1] UDM event data is not enabled by default.

"],"metadata":{"id":"eWa6AmfmT1SY"}}]}