Google Cloud Platform: Complete Service Guide

Every GCP service from Compute to AI. Zero fluff, all clarity.

5 Compute

7 Storage & DB

6 Data & Analytics

3 AI/ML

5 Security

4 DevOps

3 Ops

3 Mgmt

☁️ Compute Services

1. Compute Engine

🖥️ Compute Engine, Virtual Machines

IaaS

Fully customizable VMs running on Google's infrastructure. Choose CPU, memory, GPU, disk, and OS.

Family	Series	vCPUs	Memory	Use Case
General	E2, N2, N2D, N1	2–224	1–896 GB	Web servers, small/mid DBs, dev/test
Compute	C2, C2D, C3	4–176	16–704 GB	HPC, gaming, single-threaded apps
Memory	M1, M2, M3	32–416	256 GB–12 TB	SAP HANA, in-memory DBs, analytics
Accelerator	A2, A3 (GPU)	12–96	170–1360 GB	ML training, rendering, HPC

⚙️ Key Features

Custom TypesPick exact vCPU + memory ratio Spot/PreemptibleUp to 80% cheaper, can be reclaimed Sole-tenantDedicated physical server (compliance) Live MigrationZero-downtime host maintenance Instance TemplatesReusable VM configuration MIGManaged Instance Group, auto-scaling + healing UIGUnmanaged Instance Group, heterogeneous VMs Persistent DiskNetwork-attached: Standard, Balanced, SSD Local SSDPhysically attached, 375 GB per disk, ephemeral

🔄 Managed Instance Group (MIG) Flow

Instance Template

→

Managed Instance Group

→

Auto-scaling

→

Health Check

→

Auto-healing

Scaling signalsCPU, LB utilization, Pub/Sub, custom metrics Rolling updatesCanary, max-surge, max-unavailable Multi-zone MIGSpread across zones for HA

💰 Pricing Models

Model	Discount	Details
On-demand	0%	Pay per second, no commitment
Sustained Use (SUD)	Up to 30%	Auto-applied for consistent monthly usage
Committed Use (CUD 1yr)	Up to 37%	1-year commitment on vCPU + memory
Committed Use (CUD 3yr)	Up to 55%	3-year commitment, highest savings
Spot / Preemptible	60–91%	Can be reclaimed with 30s notice

2. Cloud Functions

⚡ Serverless Functions

Serverless

Event-driven, pay-per-invocation compute. Write a function, pick a trigger, deploy.

Trigger	Source
HTTP	Direct HTTPS endpoint
Pub/Sub	Message published to topic
Cloud Storage	Object create/delete/archive
Firestore	Document write/update/delete
Firebase	Auth, Analytics, Remote Config
Cloud Scheduler	Cron-scheduled invocations
Eventarc	90+ event sources (Audit Logs, custom)

🆚 Gen 1 vs Gen 2

Gen 1

Max timeout: 9 min
1 concurrent request per instance
Limited event sources
Max 8 GB memory

Gen 2 Recommended

Max timeout: 60 min
Up to 1000 concurrent requests
Eventarc (90+ sources)
Up to 32 GB memory
Traffic splitting (revisions)
Built on Cloud Run

🛠️ Runtimes & Config

Node.jsPythonGoJava.NETRubyPHP

Cold Start~100ms–2s depending on runtime + deps Max InstancesConfigurable per function (default 100) Min InstancesKeep warm to avoid cold starts (Gen 2) VPC ConnectorAccess private VPC resources SecretsMount from Secret Manager ConcurrencyGen 2 only, up to 1000 per instance

3. Cloud Run

🚀 Cloud Run, Serverless Containers

Serverless

Container Image

→

Cloud Run Service

→

Auto-scales 0 → N

→

HTTPS Endpoint

Cloud Run ServiceMultiple Revisions (v1, v2, v3…)

Revision v1Traffic: 10%

Revision v2Traffic: 20%

Revision v3 (latest)Traffic: 70%

Autoscaler, 0 to 1000 container instances per revision

✨ Features

Scale to ZeroNo requests = no cost Any LanguageIf it fits in a container, it runs Custom DomainsMap your own domain w/ managed TLS ConcurrencyUp to 1000 requests per container instance Min InstancesKeep warm to eliminate cold starts VPC ConnectorAccess private VPC resources Cloud SQLBuilt-in proxy connector gRPCNative gRPC + HTTP/2 support JobsRun containers to completion (batch) Volume MountsGCS buckets, NFS, in-memory

📊 Compute Comparison, When to Use What

Criteria	Cloud Run	Cloud Functions	App Engine	GKE
Unit	Container	Function	App	Pod
Scale to Zero	✅	✅	✅ (Standard)	❌
Custom Runtime	✅ Any	Limited	Flex only	✅ Any
Pricing	Per request+CPU	Per invocation	Per instance-hr	Per node
Max Timeout	60 min	60 min (Gen2)	Unlimited	Unlimited
K8s Knowledge	None	None	None	Required
Best For	APIs, microservices	Event handlers	Full web apps	Complex platforms

4. App Engine

🌐 App Engine, Standard vs Flexible

PaaS

Feature	Standard	Flexible
Languages	Python, Java, Go, Node, PHP, Ruby	Any (custom Docker)
Scaling	0 → auto (rapid)	1+ instances (slower)
Startup	Seconds	Minutes
Pricing	Per instance-hour (scale to 0)	Per VM (always ≥ 1)
Custom Runtime	❌	✅ Dockerfile
VPC Access	Via connector	Native VPC
SSH Access	❌	✅
WebSockets	❌	✅

🧩 Built-in Services

VersionsDeploy multiple versions simultaneously Traffic SplittingRoute % traffic to specific versions Cron JobsScheduled tasks via cron.yaml Task QueuesBackground task processing (Cloud Tasks) MemcacheBuilt-in caching (Standard only) FirewallIP-based ingress rules Identity-Aware ProxyAuth without code changes Custom DomainsMap domain with managed SSL

🔄 App Engine Deployment Flow

Code + app.yaml

→

gcloud app deploy

→

Version Created

→

Traffic Splitting

→

Auto-scaling

⚠️ One App Engine app per project. Cannot change region once set. Consider Cloud Run for new projects.

5. Google Kubernetes Engine (GKE)

☸️ GKE Cluster Architecture

CaaS

GKE Cluster

Control Plane (Google-managed)API Server · etcd · Scheduler · Controller Manager

Node PoolsPool-1 (e2-standard-4 × 3) · Pool-2 (n2-standard-8 × 5) · GPU Pool (a2 × 2)

Node → Pod → Container

🆚 Standard vs Autopilot

Feature	Standard	Autopilot
Node Management	You manage	Google manages
Scaling	Cluster + node autoscaler	Auto per pod
Pricing	Pay per node (VM)	Pay per pod (CPU/mem)
Security	You harden	Hardened by default
GPU / TPU	✅	✅ (limited)
Privileged Pods	✅	❌
Best For	Full control, custom configs	Hands-off, cost efficiency

🔑 Key Concepts

Node PoolsGroups of nodes with same config; multi-zone for HA Workload IdentityMap K8s ServiceAccount ↔ GCP IAM SA (no keys!) Binary AuthorizationOnly deploy signed/trusted container images GKE Gateway APIAdvanced L7 routing (replaces Ingress) Config SyncGitOps, sync cluster state from Git repo Policy ControllerOPA Gatekeeper, enforce policies on K8s resources GKE SandboxgVisor-based container isolation Release ChannelsRapid / Regular / Stable, auto-upgrade cadence

🌐 GKE Networking

VPC-nativeAlias IPs for pods, native routing, no overlay ClusterIPInternal-only service (within cluster) NodePortExpose on each node's IP:port LoadBalancerProvision GCP L4 load balancer IngressGCP L7 HTTP(S) Load Balancer Network PoliciesPod-to-pod firewall rules (Calico / Dataplane V2) Private ClusterNodes have internal IPs only Dataplane V2eBPF-based networking (Cilium), faster, built-in policies

💾 Storage & Databases

6. Cloud Storage (GCS)

🪣 Storage Classes

Class	Min Duration	Availability	Retrieval Cost	Use Case
Standard	None	99.99% (multi) / 99.9% (region)	Free	Hot data, frequently accessed
Nearline	30 days	99.95% / 99.0%	$0.01/GB	Backups accessed monthly
Coldline	90 days	99.95% / 99.0%	$0.02/GB	Disaster recovery, quarterly access
Archive	365 days	99.95% / 99.0%	$0.05/GB	Long-term archives, compliance

All classes offer strong global consistency. Identical API, only pricing differs.

🔧 Features

BucketsGlobally unique names, regional or multi-regional ObjectsImmutable once written (overwrite = new version) VersioningKeep all versions of an object Lifecycle RulesAuto-delete or transition classes by age/date Retention PolicyMinimum retention (compliance / WORM) Object HoldsEvent-based or temporary hold, prevent deletion Signed URLsTime-limited access without credentials ACLs + IAMFine-grained (ACL) or bucket-level (IAM) access Requester PaysData consumer pays for egress + operations

🔄 Lifecycle & Integration

Upload Object

→

Standard Bucket

→

30d → Nearline

→

90d → Coldline

→

365d → Archive

BigQueryFederated queries directly on GCS files DataflowSource/sink for batch + streaming pipelines GKECSI driver for volume mounts Cloud FunctionsTrigger on object create/delete/archive Transfer ServiceMove from AWS S3, Azure, on-prem

7. Cloud SQL

🐬 Managed Relational Database

Managed

Engine	Versions	Max Storage
MySQL	5.7, 8.0	64 TB
PostgreSQL	12, 13, 14, 15, 16	64 TB
SQL Server	2017, 2019, 2022	64 TB

vCPUsUp to 96 vCPUs MemoryUp to 624 GB IOPSUp to 60,000 (SSD)

🛡️ Features

HARegional, synchronous replication + auto failover Read ReplicasSame region, cross-region, or external BackupsAutomated daily + on-demand PITRPoint-in-time recovery via binary logs Private IPVPC-peered, no public exposure MaintenanceConfigurable maintenance windows Query InsightsSlow query monitoring, query plans IAM AuthLog in using IAM instead of passwords

🏗️ HA Architecture

Primary InstanceZone A, Read + Write

⟵ synchronous replication ⟶

Standby InstanceZone B, Hot standby

Automatic FailoverDNS-based, ~60s failover, same IP

Read replicas use asynchronous replication, good for read scaling, not HA.

8. Cloud Spanner

🌍 Globally Distributed Relational Database

Global

WhatRelational DB with horizontal scaling + strong consistency ConsistencyExternal consistency (strongest) via TrueTime (atomic clocks + GPS) SLA99.999% (multi-region), "five nines" SQLGoogleSQL (ANSI SQL compliant) + PostgreSQL interface ShardingAutomatic, data split across nodes by primary key Multi-regionnam3, nam6, nam-eur-asia1, global configs ScaleAdd/remove nodes on the fly, linear throughput scaling

🏗️ Architecture

Spanner Instance (Global)Contains one or more databases

Region ANode 1, Node 2

Region BNode 3, Node 4

Region C (witness)Voting only

Splits (shards)Auto-distributed by primary key range → Colossus storage

TrueTime: atomic clocks + GPS → globally consistent timestamps → external consistency without coordination lag.

📊 Spanner vs Cloud SQL

Feature	Cloud SQL	Cloud Spanner
Scale	Vertical (bigger VM)	Horizontal (add nodes)
Max Size	64 TB	Unlimited (petabyte+)
Multi-region	Read replicas only	Native multi-region writes
Consistency	Regional strong	Global external
SLA	99.95%	99.999%
Cost	$$ (from ~$7/mo)	$$$$ ($0.90/node-hr)
Best For	Traditional apps, small-mid scale	Global apps, financial, gaming

9. Firestore / Datastore

🔥 NoSQL Document Database

NoSQL

Native Mode

Real-time listeners
Offline support (mobile)
Firebase SDK
Security Rules
Strong consistency

Datastore Mode

No real-time/offline
Server-side only
Higher write throughput
IAM-based access
Eventually consistent reads available

🗂️ Data Model & Features

Collection users

Document user_123 (name, email, age)

Subcollection user_123/orders

Document order_456 (item, qty, price)

Real-timeonSnapshot listeners push changes instantly OfflineLocal cache, sync when online (mobile/web) TransactionsACID transactions (up to 500 docs) IndexesAuto single-field + manual composite indexes TTLAuto-delete documents by timestamp field

10. Cloud Bigtable

⚡ Wide-Column NoSQL

Petabyte

ScalePetabyte-scale, billions of rows LatencySingle-digit millisecond (consistent) CompatibilityHBase API compatible Use CasesTime-series, IoT, analytics, ad-tech, finance ThroughputMillions of reads/writes per second ReplicationMulti-cluster, eventual consistency

🏗️ Architecture & Data Model

Client (cbt CLI / HBase / gRPC)

Bigtable Cluster, Nodes (compute)Node count determines throughput

Colossus (storage)SSD or HDD, decoupled from compute

Concept	Description
Row Key	Unique identifier, lexicographically sorted
Column Family	Group of related columns (defined at table creation)
Column Qualifier	Individual column within a family
Cell	Value at row × column, timestamped (versioned)

Design row keys carefully: avoid hotspots, use reverse timestamps for time-series data.

11. Memorystore

💨 Managed In-Memory Stores

Feature	Redis	Memcached
Persistence	✅ RDB/AOF	❌
Data Types	Strings, Lists, Sets, Hashes, Sorted Sets, Streams	Strings only
Clustering	✅ (Redis Cluster)	✅ (distributed)
Pub/Sub	✅	❌
Replication	✅ Read replicas + HA	❌
Max Size	300 GB	5 TB (distributed)
Lua Scripting	✅	❌

🎯 Use Cases & Config

Session CachingStore user sessions for web apps LeaderboardsSorted sets for real-time rankings Rate LimitingToken bucket / sliding window Real-time AnalyticsCounters, HyperLogLog HAAuto-failover (Redis Standard/HA tier) NetworkVPC-peered, private access only AuthIAM-based (Redis 7.0+) or AUTH string MonitoringCloud Monitoring metrics (hits, misses, memory)

12. AlloyDB

🐘 AlloyDB for PostgreSQL

PostgreSQL

Performance4× faster than standard PostgreSQL (OLTP) Analytics100× faster analytical queries (columnar engine) Compatibility100% PostgreSQL compatible (wire protocol) AI Built-inpgvector for vector search / embeddings HA99.99% SLA, auto-failover in < 30s ScaleUp to 128 vCPUs, 864 GB RAM, 64 TB storage

🏗️ Architecture, Disaggregated Storage

Primary Instance (Compute)Handles reads + writes

Ultra-fast Log ProcessingWAL processed before storage write, low-latency commits

Disaggregated Storage (Google Colossus)Shared across primary + read pools, automatic replication

Read Pool Instance 1

Read Pool Instance 2

HTAP: handle both OLTP and OLAP workloads in a single database with the columnar engine.

📊 AlloyDB vs Cloud SQL vs Spanner

Feature	Cloud SQL	AlloyDB	Cloud Spanner
Engine	MySQL, PostgreSQL, SQL Server	PostgreSQL only	GoogleSQL / PG interface
Scale	Vertical	Vertical + read pools	Horizontal (unlimited)
Multi-region	Cross-region read replicas	Cross-region read replicas	Native multi-region writes
OLTP Speed	Baseline	4× faster	Comparable
Analytics	Limited	100× (columnar engine)	Good (SQL)
SLA	99.95%	99.99%	99.999%
Cost	$$	$$$	$$$$
Best For	Standard workloads	High-performance PG apps	Global-scale apps

📊 Data & Analytics

13. BigQuery

🔍 Serverless Data Warehouse

Serverless

ScalePetabyte-scale, query TBs in seconds StorageColumnar (Capacitor format) on Colossus ComputeDremel engine, massively parallel SQL execution via slots StreamingReal-time inserts via streaming API BQMLTrain ML models in SQL (linear reg, XGBoost, DNN, ARIMA+) BI EngineIn-memory acceleration for sub-second dashboards FederatedQuery external data: GCS, Cloud SQL, Sheets, Bigtable BigLakeUnified data lake, query across GCS + BQ with one interface

🏗️ Architecture

Data SourcesGCS, Pub/Sub, Cloud SQL, Sheets, APIs

BigQueryDremel Engine (compute) + Colossus (storage)

SQL Query

BQML

BI Engine

Results → Looker Studio / Sheets / Export

💰 Pricing

Component	Model	Price
Queries (On-demand)	Per TB scanned	$5/TB (first 1 TB/mo free)
Queries (Capacity)	Slot-based	~$0.04/slot-hour (autoscale)
Active Storage	Per GB/month	$0.02/GB
Long-term Storage	>90 days unmodified	$0.01/GB
Streaming Inserts	Per 200 MB	$0.01

Cost optimization: use partitioning + clustering to minimize bytes scanned.

🧩 Key Features

Feature	What It Does
Partitioning	Split table by date/int range/ingestion time, prune scans
Clustering	Sort data within partitions by columns, co-locate related rows
Materialized Views	Pre-computed aggregates, auto-refreshed
Scheduled Queries	Cron-based SQL execution
Data Transfers	Import from SaaS (GA, Ads, YouTube, S3)
BigLake	Fine-grained access on data lake files
Analytics Hub	Share datasets across orgs (marketplace)
Change Data Capture	Datastream → real-time CDC into BigQuery

14. Pub/Sub

📨 Global Messaging Service

PublisherApp, Cloud Functions, IoT, gcloud

TopicNamed channel for messages

Subscription (Push)HTTP endpoint delivery

Subscription (Pull)Client polls for messages

SubscriberCloud Run, Dataflow, GKE, Functions

⚙️ Features

GlobalMulti-region by default, publish anywhere ServerlessNo infra to manage, auto-scales DeliveryAt-least-once (default), exactly-once (configurable) OrderingMessage ordering by ordering key Dead LetterRoute failed messages after N retries SchemaAvro / Protocol Buffer schema validation Retention7 days (configurable, up to 31 days) FilteringAttribute-based subscription filters ThroughputMillions of messages/sec

🎯 Use Cases

Event-drivenDecouple services, publish events, react asynchronously Streaming ETLPub/Sub → Dataflow → BigQuery (real-time pipeline) Fan-out1 topic → N subscriptions (broadcast to multiple consumers) MicroservicesAsync communication between services IoT IngestionMillions of devices publishing sensor data Log AggregationCentralize logs from distributed systems

15. Dataflow

🌊 Managed Apache Beam Runner

Unified

Source

→

PCollection

→

Transforms (ParDo, GroupByKey, CoGroupByKey, Flatten)

→

PCollection

→

Sink

Sources:

Pub/SubGCSBigQueryKafkaJDBC

Sinks:

BigQueryGCSBigtablePub/SubSpanner

⚙️ Features

UnifiedSame code for batch + stream (Apache Beam SDK) Auto-scalingWorkers scale up/down based on backlog Exactly-onceGuaranteed processing semantics Streaming SQLWrite streaming pipelines in SQL TemplatesPre-built (Google-provided) + custom reusable jobs Flex TemplatesContainerized, custom deps, private repos LanguagesJava, Python, Go (via Apache Beam SDK)

🆚 Batch vs Streaming

Aspect	Batch	Streaming
Input	Bounded (files, tables)	Unbounded (Pub/Sub, Kafka)
Latency	Minutes–hours	Seconds
Windowing	Global window	Fixed / Sliding / Session
Workers	Scale to 0 after job	Always running
Use Case	ETL, backfills, reports	Real-time dashboards, alerts

16. Dataproc

🔥 Managed Spark / Hadoop

Cluster Creation~90 seconds, fast spin-up Auto-scalingScale workers based on YARN metrics Component GatewayJupyter, Spark UI, Zeppelin, HDFS web UI Init ActionsCustom setup scripts at cluster start Preemptible Workers60–91% cheaper secondary workers EcosystemSpark, Hadoop, Hive, Pig, Presto, Flink ServerlessDataproc Serverless, submit jobs, no cluster management

🆚 Dataproc vs Dataflow

Criteria	Dataproc (Spark)	Dataflow (Beam)
Engine	Apache Spark	Apache Beam
Management	Clusters (or serverless)	Fully serverless
Best For	Existing Spark jobs, ML (Spark ML), interactive analysis	New pipelines, streaming-first, unified batch/stream
Latency	Micro-batch (~500ms)	True streaming (per-element)
Cost Model	Cluster VMs (per second)	Worker VMs (auto-managed)
Portability	Run on any Spark cluster	Beam runs on Flink, Spark, Dataflow

Rule of thumb: existing Spark → Dataproc. New streaming → Dataflow.

17. Cloud Composer

🎼 Managed Apache Airflow

WhatDAG-based workflow orchestration, fully managed DAGsPython-defined directed acyclic graphs OperatorsBigQuery, Dataflow, GCS, GKE, Cloud SQL, Dataproc… SensorsWait for file, time, external task, HTTP Composer 2Auto-scaling workers, faster scheduling, lower cost SecretsIntegration with Secret Manager MonitoringAirflow UI + Cloud Monitoring + Cloud Logging

🔄 Orchestration Flow

DAG Definition (Python)

→

Airflow Scheduler

→

Worker Nodes

BigQuery job Dataflow pipeline GCS copy GKE workload Email notification

Composer environments run on GKE, you can customize machine types and node counts.

18. Looker & Looker Studio

📊 Enterprise BI vs Free Dashboards

Feature	Looker	Looker Studio (free)
Cost	Enterprise license	Free
Modeling	LookML (semantic layer)	No modeling layer
Data Governance	Centralized metrics, row-level security	Basic sharing
Embedded Analytics	✅ iframes, SSO	✅ embeddable reports
Custom Viz	Custom components (React)	Community visualizations
API	Full REST API	Limited
Best For	Enterprise data teams	Quick dashboards, individuals

📈 Looker Studio

CostCompletely free ConnectorsBigQuery, Sheets, Cloud SQL, 800+ connectors SharingLike Google Docs, view/edit permissions TemplatesPre-built report templates (GA4, Ads, etc.) InteractivityDate ranges, filters, drill-downs SchedulingAutomated email delivery of reports Calculated FieldsCustom metrics + dimensions (formula editor)

🤖 AI / Machine Learning

19. Vertex AI

🧠 Unified ML Platform

MLOps

Data SourcesBigQuery, GCS, Pub/Sub

Feature StoreCentralized feature engineering + serving

AutoMLNo-code training

Custom TrainingYour containers / TF / PyTorch

Model RegistryVersion, manage, compare models

Online PredictionLow-latency endpoints

Batch PredictionHigh-throughput, async

Model MonitoringDrift detection, skew, feature attribution

🧩 Components

Component	Description
AutoML	Train models without code, image, text, tabular, video
Custom Training	Bring your own container / pre-built (TF, PyTorch, XGBoost)
Pipelines	Kubeflow / TFX-based ML workflow orchestration
Feature Store	Centralized feature management + online/offline serving
Model Registry	Version control, metadata, lineage
Endpoints	Deploy models for real-time or batch serving
Experiments	Track + compare training runs
TensorBoard	Managed training visualization
Vector Search	Nearest-neighbor search (embeddings at scale)

🪄 AutoML Capabilities

Domain	Tasks
Vision	Image classification, object detection, segmentation
NLP	Sentiment analysis, entity extraction, classification
Tables	Structured data, regression, classification, forecasting
Video	Classification, object tracking, action recognition

AutoML handles data preprocessing, architecture search, hyperparameter tuning, and deployment automatically.

20. Pre-trained AI APIs

🔌 Ready-to-Use AI, REST API Calls, No ML Expertise Needed

API	Input	Output	Use Case
Vision AI	Image	Labels, OCR, faces, landmarks, objects	Image tagging, document scanning, moderation
Natural Language	Text	Sentiment, entities, syntax, categories	Review analysis, content classification
Speech-to-Text	Audio	Transcription (125+ languages)	Subtitles, voice commands, call center
Text-to-Speech	Text	Audio (380+ voices, 50+ languages)	Accessibility, IVR, audiobooks
Translation	Text	Translated text (130+ languages)	Localization, real-time translation
Video Intelligence	Video	Labels, shots, objects, text, faces	Media analysis, content moderation
Document AI	Document (PDF/image)	Structured data, entities, tables	Invoice processing, form parsing, ID verification
Dialogflow	Text / Audio	Intent, entities, response	Chatbots, IVR, virtual agents

All APIs: REST + client libraries (Python, Java, Go, Node.js). Pay per request, no infrastructure to manage.

21. Gemini

✨ Gemini, Multimodal Foundation Model

GenAI

Model	Capability	Context	Best For
Gemini Ultra	Most capable, complex reasoning	1M+ tokens	Advanced research, multi-step reasoning
Gemini Pro	Balanced performance/cost	1M tokens	General tasks, enterprise applications
Gemini Flash	Fastest, most cost-efficient	1M tokens	High-volume, latency-sensitive tasks

MultimodalText + Image + Audio + Video + Code in a single prompt Context Window1M+ tokens, process entire codebases, long documents Function CallingStructured tool use, connect to APIs and databases GroundingGoogle Search grounding, real-time factual responses Code GenerationGenerate, explain, debug code in 20+ languages ReasoningChain-of-thought, multi-step problem solving

🔗 Integrations

Vertex AIGemini API on GCP, enterprise security, VPC, audit logs AI StudioWeb-based prompt IDE, prototype rapidly Gemini APIDirect REST/SDK access for developers WorkspaceGemini in Docs, Sheets, Gmail, Meet, Slides Code AssistIDE integration, code completion, explanation, generation Cloud ConsoleNatural language queries for GCP resources

🎯 Use Cases

ChatbotsCustomer support, internal assistants Code ReviewAutomated PR reviews, bug detection Document AnalysisSummarize contracts, extract key terms Image UnderstandingDescribe images, extract text, answer questions SummarizationMeeting notes, article summaries, email triage TranslationContext-aware, nuanced translation Data AnalysisNatural language to SQL, chart interpretation

🔒 Security

22. IAM, Identity & Access Management

🔐 IAM Policy Model

WHOMember (identity)

WHATRole (permissions)

WHEREResource (scope)

Policy Bindinge.g. [email protected] + roles/storage.admin + bucket-xyz

IAM policies are additive, if ANY policy grants a permission, it's allowed. There are no explicit "deny" policies (use deny policies for exceptions).

👤 Member Types

Type	Format	Description
Google Account	user:[email protected]	Individual person
Service Account	serviceAccount:[email protected]…	Identity for apps/services
Google Group	group:[email protected]	Collection of accounts
Workspace Domain	domain:company.com	All accounts in domain
allAuthenticatedUsers	,	Any logged-in Google account
allUsers	,	Anyone on the internet (public)

🎭 Role Types & Inheritance

Type	Example	Details
Basic Avoid	Owner, Editor, Viewer	1000s of permissions, too broad
Predefined Recommended	roles/storage.objectViewer	Per-service, fine-grained
Custom	roles/myCustomRole	You define exact permissions

Organization Policies inherited by all below

Folder Dept / Environment grouping

Project Isolation + billing boundary

Resource VM, bucket, dataset, etc.

✅ Best Practices

Least PrivilegeGrant only what's needed, nothing more Use GroupsAssign roles to groups, not individuals Avoid Basic RolesEditor/Owner are dangerously broad Use PredefinedGoogle maintains per-service roles Policy AnalyzerAudit: who has access to what? IAM ConditionsConditional access by time, resource type, IP IAM RecommenderAuto-suggests role downgrades for unused permissions Deny PoliciesExplicit deny, override any allow (new feature)

23. Service Accounts

🤖 Identity for Services

Type	Created By	Example
Default	Auto (GCE, App Engine)	PROJECT_NUM-compute@…
User-managed	You	[email protected]
Google-managed	Google	Internal agents (cloud services)

Service accounts are both an identity (authenticate as SA) and a resource (grant others access to impersonate it).

🔑 Authentication Methods

JSON KeysDownloadable key file, avoid if possible Workload Identity FederationNo keys! Federate from AWS, Azure, GitHub, OIDC Workload Identity (GKE)Map K8s SA ↔ GCP SA, no keys ImpersonationUser/SA acts as another SA, audit trail Short-lived CredentialsTemporary tokens (1hr default) via STS Attached SAVM/Cloud Run/Functions, auto-injected credentials

✅ Best Practices

Don't Use Default SACreate dedicated SAs per service Don't Download KeysUse Workload Identity or attached SAs Workload IdentityFor GKE, GitHub Actions, AWS, Azure workloads IAM ConditionsRestrict by time, resource, IP Disable UnusedDisable SAs not used for 90+ days Key RotationIf you must use keys, rotate regularly AuditPolicy Analyzer + Activity Analyzer for SA usage

24. KMS & Secret Manager

🔑 Cloud KMS, Key Management

Key RingLogical grouping (per region)

Crypto KeyEncryption key (AES, RSA, EC)

Key Versionv1, v2, v3… (rotate without re-encrypt)

Encryption Level	Who Manages Key	Details
Google Default	Google	Auto, AES-256, no config needed
CMEK	Customer (in KMS)	Your key, Google's HSM
CSEK	Customer (external)	You supply key per-request
EKM	External KMS	Key never leaves your premises

🤫 Secret Manager

WhatStore API keys, passwords, certificates, tokens VersioningImmutable versions, enable/disable/destroy IAMFine-grained access per secret Auto-rotationCloud Functions trigger on rotation schedule Regional/GlobalChoose replication policy Cloud RunMount as env var or volume Cloud FunctionsDirect reference in config GKESecret Store CSI Driver integration

🛡️ Encryption Layers

Encryption at RestAES-256, all data on disk automatically encrypted

Encryption in TransitTLS 1.3, all data between services + to clients

Encryption in UseConfidential VMs, data encrypted in memory (AMD SEV)

Confidential Computing: data stays encrypted even during processing, trusted execution environments (TEEs).

25. Security Command Center

🛡️ Security Command Center, Unified Security Dashboard

Standard Tier (Free)

Security Health Analytics
Web Security Scanner (basic)
Anomaly detection
Asset inventory

Premium Tier

All Standard features
Event Threat Detection
Container Threat Detection
VM Threat Detection
Compliance (CIS, PCI, NIST, ISO)
Attack path simulation

🔍 Capabilities

Security Health AnalyticsDetect misconfigs (public buckets, open firewalls, etc.) Event Threat DetectionLog analysis, brute force, crypto mining, data exfil Container Threat DetectionDetect malicious binaries, reverse shells in GKE Web Security ScannerXSS, mixed content, outdated libraries ComplianceMap findings to CIS Benchmarks, PCI DSS, NIST 800-53 Attack PathSimulate attack paths to high-value resources

🔄 Findings Pipeline

GCP Resources

→

SCC Scanners

→

Findings

→

Notifications (Pub/Sub)

→

SIEM / SOAR

ExportBigQuery for custom analytics / dashboarding Pub/SubReal-time alerting + SIEM integration ChronicleGoogle's SIEM, native SCC integration Mute RulesSuppress known-good findings (reduce noise)

26. Organization Policy & VPC Service Controls

📋 Organization Policies

Constraints applied at org, folder, or project level, guardrails for the entire cloud.

Policy	Effect
Restrict VM external IPs	VMs can't have public IPs
Restrict resource locations	Only allow us-central1, europe-west1
Disable serial port access	Block VM serial console login
Disable SA key creation	No downloadable service account keys
Restrict shared VPC projects	Control who can attach to shared VPC
Uniform bucket-level access	Force IAM-only (no ACLs) on GCS

🏰 VPC Service Controls

API RequestUser or service calling GCP API

Access Level CheckIP range, device policy, identity, geo

Service PerimeterBoundary around GCP projects + services

GCP APIBigQuery, GCS, Pub/Sub, etc., data stays inside perimeter

PreventsData exfiltration, even if IAM is misconfigured Perimeter BridgesAllow controlled sharing between perimeters Dry RunTest policies before enforcement Ingress/EgressFine-grained rules for cross-perimeter access

🔧 DevOps

27. Cloud Build

🏗️ CI/CD Pipeline

Source (GitHub / CSR / Bitbucket)

→

Trigger (push / PR / tag)

→

Build Steps

→

Artifact Registry

→

Deploy (Cloud Run / GKE)

cloudbuild.yamlsteps: [(name: 'gcr.io/cloud-builders/docker', args: ['build', '-t', '...'])]

Step 1: Builddocker build

Step 2: Testgo test / npm test

Step 3: Pushdocker push

Step 4: Deploygcloud run deploy

⚙️ Features

TriggersPush, PR, tag, manual, Pub/Sub, webhook ConfigYAML (cloudbuild.yaml) or Dockerfile Parallel StepsRun steps concurrently with waitFor Worker PoolsPrivate, run in your VPC Buildersdocker, gcloud, kubectl, terraform, maven, gradle, npm Substitutions$BRANCH_NAME, $COMMIT_SHA, custom vars ApprovalManual approval gates for production deploys

💰 Pricing & Integration

Free Tier120 build-minutes/day (e2-standard-1) Machine Typese2-standard-1, e2-highcpu-8, e2-highcpu-32 Artifact RegistryPush images/packages directly Cloud DeployContinuous delivery to GKE/Cloud Run Binary AuthorizationAttest builds for trusted deployment SLSASupply chain security levels (provenance)

28. Artifact Registry

📦 Multi-Format Repository

Format	Ecosystem	Example
Docker	Containers	us-docker.pkg.dev/proj/repo/img:tag
Maven / Gradle	Java	Java libraries and apps
npm	Node.js	JavaScript packages
pip (PyPI)	Python	Python packages
Go	Go modules	Go dependencies
Apt / Yum	OS packages	Debian / RPM packages

🔧 Features

Vulnerability ScanningAutomatic CVE detection for container images IAM AccessFine-grained per-repo permissions Regional / Multi-regionalStore close to your compute Cleanup PoliciesAuto-delete old tags/versions by age Immutable TagsPrevent tag overwriting (production safety) Virtual ReposProxy upstream + private repos in one endpoint Remote ReposCache Docker Hub, Maven Central, npm registry SBOMSoftware bill of materials generation

29. Cloud Deploy

🚢 Managed Continuous Delivery

Artifact (image)

→

Release

→

Dev

→

Staging

→

Prod

Delivery Pipeline (YAML)Defines ordered targets + strategy

Dev TargetAuto-promote

Staging TargetApproval gate

Prod TargetCanary → Full rollout

⚙️ Features

Pipeline as CodeYAML-defined delivery pipelines + targets RollbackOne-click rollback to previous release Canary DeploysProgressive, 10% → 50% → 100% TargetsGKE clusters, Cloud Run services ApprovalManual approval workflows per target Deploy HooksPre/post deploy actions (verify, test) Automation RulesAuto-promote on success, auto-rollback on failure Parallel DeploysDeploy to multiple targets simultaneously

30. Infrastructure as Code

📝 Terraform vs Deployment Manager

Feature	Terraform Recommended	Deployment Manager
Language	HCL (HashiCorp)	YAML + Jinja/Python
Multi-cloud	✅ (AWS, Azure, GCP, +1000 providers)	GCP only
State	Remote (GCS bucket) or Terraform Cloud	Google-managed
Modules	Rich registry + custom modules	Templates
Community	Massive ecosystem	Limited
Plan/Preview	terraform plan (diff before apply)	Preview API
Status	Actively developed	Maintenance mode

🔧 Terraform on GCP

Providergoogle + google-beta (hashicorp/google) Authgcloud auth application-default login State BackendGCS bucket (locking with versioning) Modulesterraform-google-modules (Google official) WorkspacesIsolate dev/staging/prod state ImportImport existing GCP resources into state Terraform CloudRemote execution, policy-as-code (Sentinel)

🔀 Other IaC Tools

Config ConnectorK8s CRDs for GCP resources, GitOps-native IaC PulumiTypeScript, Python, Go, C#, real programming languages CrossplaneK8s-native universal control plane for any cloud CDK for TerraformUse TypeScript/Python to generate Terraform

Config Connector: ideal if you already run GKE and prefer GitOps workflows.

📡 Operations (Observability)

31. Cloud Monitoring

📊 Observability Flow

GCP Resources

→

Metrics

→

Cloud Monitoring

→

Dashboards + Alerts

→

Notifications

Email Slack PagerDuty Pub/Sub Webhooks SMS

⚙️ Features

Built-in Metrics1500+ metrics for all GCP services Custom MetricsWrite your own via API or OpenTelemetry Uptime ChecksHTTP, TCP, HTTPS from global locations SLO MonitoringDefine SLIs + SLOs, track error budgets Alerting PoliciesConditions + notification channels + documentation DashboardsCustom dashboards with charts, gauges, tables Metrics ExplorerAd-hoc queries and visualization PromQLQuery metrics using Prometheus syntax

🔗 Advanced

MQLMonitoring Query Language, powerful filtering + aggregation Multi-projectMetrics scopes, single pane across projects PrometheusManaged Prometheus (GMP), scrape + PromQL GrafanaUse Grafana with Cloud Monitoring datasource Service MonitoringAuto-detect App Engine, GKE, Cloud Run services Ops AgentUnified agent for metrics + logs on GCE VMs

32. Cloud Logging

📝 Logging Pipeline

GCE

GKE

Cloud Run

Cloud Functions

App Engine

Custom Apps

Cloud LoggingIngest, index, analyze

Log Router (Sinks)Include/exclude filters → route to destinations

Cloud StorageLong-term archive

BigQuerySQL analytics

Pub/SubStreaming / SIEM

Splunk / ChronicleExternal SIEM

⚙️ Features

Log ExplorerSearch, filter, analyze logs in real-time Log-based MetricsCreate custom metrics from log patterns Log-based AlertsAlert when specific log entries appear Retention30 days default (_Default bucket), configurable up to 3650 days Exclusion FiltersDrop noisy logs before ingestion (save cost) Log RouterRoute logs to different sinks with filters Log BucketsCustom storage with per-bucket retention Log AnalyticsSQL-like queries on log data (BigQuery-powered)

📋 Log Types

Type	Source	Details
Platform Logs	GCP services	Auto-generated (GCE, GKE, Cloud SQL…)
User Logs	Your applications	Stdout/stderr, logging client libraries
Audit Logs	Admin + Data Access	Who did what, when, where
Access Transparency	Google staff	When Google accesses your data

Admin Activity audit logs: always on, free, 400-day retention. Data Access: must enable, chargeable.

33. Trace, Profiler & Error Reporting

🔍 Cloud Trace, Distributed Tracing

WhatDistributed tracing, track requests across services ProtocolOpenTelemetry (recommended), Zipkin, Cloud Trace API Auto-instrumentedApp Engine, Cloud Run, Cloud Functions AnalysisLatency distribution, bottleneck identification Trace ExplorerSearch traces by latency, service, status IntegrationLink traces ↔ logs ↔ metrics for full context

🔥 Cloud Profiler

WhatContinuous CPU + memory profiling in production Overhead< 0.5%, safe for production VisualizationInteractive flame graphs LanguagesJava, Go, Python, Node.js CompareSide-by-side profiles across versions/time CostFree

🚨 Error Reporting

WhatAggregate + display errors across GCP services GroupingAuto-group similar errors by stack trace Stack TracesFull stack traces with source context NotificationsEmail / mobile alerts on new errors IntegrationCloud Logging, errors auto-detected from logs LanguagesJava, Python, Go, Node.js, .NET, Ruby, PHP ResolutionMark errors as acknowledged / resolved / muted

🏛️ Management

34. Resource Hierarchy

🏗️ GCP Resource Hierarchy

Organizationcompany.com, top of hierarchy (Workspace/Cloud Identity domain)

Folder: EngineeringDept grouping

Folder: FinanceDept grouping

Folder: Dev

Folder: Staging

Folder: Prod

Project: web-app-dev

Project: api-staging

Project: api-prod

GCE VMs

GCS Buckets

BigQuery Datasets

Cloud SQL Instances

IAM policies + org policies inherit downward. A policy set at the org level applies to every resource below it.

📚 Key Concepts

OrganizationRoot node, linked to Workspace/Cloud Identity domain FoldersOptional grouping, up to 10 nesting levels ProjectsIsolation boundary: own IAM, billing, APIs, quotas Project IDGlobally unique, immutable once created Project NumberAuto-assigned, used internally LabelsKey-value metadata on resources (for billing, filtering) Resource ManagerAPI to manage org/folders/projects programmatically

✅ Best Practices

One OrgSingle org for all company resources Folder by Dept + EnvEngineering/Finance → Dev/Staging/Prod Project per ServiceSeparate projects for each app/microservice Labelsteam, env, cost-center, app, for cost tracking Shared VPCCentralize networking in a host project Policy InheritanceSet org-wide policies at top, override at lower levels

35. Billing & Cost Management

💳 Billing Structure

Billing AccountPayment method + invoicing

Project A

Project B

Project C

BudgetsSet thresholds + email/Pub/Sub alerts Cost BreakdownBy project, service, SKU, label Billing ExportExport to BigQuery for custom analysis CUDsCommitted use discounts (1yr/3yr) SUDsSustained use discounts (auto-applied)

💡 Cost Optimization

Right-size VMsUse Recommender to downsize underutilized VMs Spot / PreemptibleUp to 91% savings for fault-tolerant workloads CUDsCommit for stable workloads (37–55% off) AutoscalingScale down during low traffic Storage LifecycleAuto-transition to colder storage classes BQ Flat-rateCapacity pricing for predictable BQ costs Network EgressKeep traffic intra-region when possible Idle ResourcesDelete unused IPs, disks, LBs, snapshots

🛠️ Tools

Cost ManagementDashboard with spend trends + forecasts Recommender APIRight-sizing, idle resource cleanup suggestions Active AssistUmbrella for all recommendation engines FinOps HubCentralized cost visibility + governance Pricing CalculatorEstimate costs before deploying Committed Use AnalysisAnalyze CUD coverage + utilization

36. GCP Quick Reference, Master Table

📖 Every GCP Service at a Glance

Category	Service	Type	Use Case
Compute	Compute Engine	IaaS (VMs)	Custom VMs, lift-and-shift, HPC
	Cloud Functions	FaaS (Serverless)	Event-driven functions, webhooks
	Cloud Run	CaaS (Serverless)	Containerized APIs, microservices
	App Engine	PaaS	Full web apps, rapid deployment
	GKE	CaaS (Managed K8s)	Complex platforms, multi-service orchestration
Storage & DB	Cloud Storage (GCS)	Object Storage	Files, backups, data lake, static hosting
	Cloud SQL	Managed RDBMS	MySQL, PostgreSQL, SQL Server workloads
	Cloud Spanner	Global RDBMS	Global apps, finance, 99.999% SLA
	Firestore	NoSQL Document	Mobile/web apps, real-time sync
	Cloud Bigtable	NoSQL Wide-column	IoT, time-series, analytics (petabyte-scale)
	Memorystore	In-memory	Caching, sessions, leaderboards
	AlloyDB	Managed PostgreSQL	High-perf OLTP + OLAP, AI workloads
Data & Analytics	BigQuery	Data Warehouse	SQL analytics, ML, petabyte-scale queries
	Pub/Sub	Messaging	Event streaming, decoupling, fan-out
	Dataflow	Stream/Batch (Beam)	ETL pipelines, real-time processing
	Dataproc	Managed Spark/Hadoop	Existing Spark jobs, big data processing
	Cloud Composer	Workflow (Airflow)	DAG orchestration, batch scheduling
	Looker / Looker Studio	BI / Dashboards	Enterprise BI, self-service dashboards
AI / ML	Vertex AI	ML Platform	Train, deploy, manage ML models (AutoML + custom)
	Pre-trained AI APIs	AI APIs	Vision, NLP, Speech, Translation, no ML skills
	Gemini	Foundation Model	Multimodal GenAI, chat, code, analysis
Security	IAM	Access Control	Who can do what on which resource
	Service Accounts	Machine Identity	Identity for apps, VMs, CI/CD
	KMS + Secret Manager	Key / Secret Mgmt	Encryption keys, API secrets, certs
	Security Command Center	Security Posture	Vulnerabilities, threats, compliance
	Org Policy + VPC-SC	Governance	Guardrails, data exfiltration prevention
DevOps	Cloud Build	CI/CD	Build, test, deploy automation
	Artifact Registry	Package Repository	Docker images, npm, pip, Maven packages
	Cloud Deploy	Continuous Delivery	Progressive rollout to GKE / Cloud Run
	Terraform / Config Connector	IaC	Provision infrastructure as code
Operations	Cloud Monitoring	Metrics + Alerts	Dashboards, SLOs, uptime checks, alerts
	Cloud Logging	Log Management	Ingest, search, route, analyze logs
	Trace / Profiler / Error Reporting	APM	Distributed tracing, profiling, error tracking
Networking	VPC	Virtual Network	Isolated network, subnets, firewall rules
	Cloud Load Balancing	Load Balancer	Global/regional L4/L7 traffic distribution
	Cloud CDN	CDN	Cache content at Google edge locations
	Cloud DNS	DNS	Managed authoritative DNS
	Cloud Interconnect / VPN	Hybrid Connectivity	Connect on-prem to GCP (dedicated or VPN)
Management	Resource Hierarchy	Organization	Org → Folders → Projects → Resources
	Billing	Cost Management	Budgets, alerts, cost optimization
	Active Assist / Recommender	Optimization	Right-sizing, idle cleanup, security fixes