Google Cloud Certification Renewal

My GCP Certification has one month left for renewal, so I’m going to put my notes here. My primary source of study is Linux Academy, where I have a yearly subscription.

Databases

Cloud SQL

  • Classic MySQL, Postgres type of database
  • Only in one region, with read replicas (same zone) or failover replicas (in a different zone from the primary DB instance’s zone)
  • If there’s a read replica and a failover: upon failover, a new read replica will be created in the same zone as the failover replica. The failover is promoted to primary DB
  • There’s failover replication lag
  • Vertical scaling (i.e. increase the instance’s CPU or RAM), requires restart
  • Storage scales automatically

Cloud Spanner

  • Expensive horizontally scalable SQL
  • Cross region
  • Still need to choose machine type

Datastore

  • NoSQL
  • No need to spec any instance (fully managed)
  • Up to ~Terabytes of data
  • Web/mobile apps

Bigtable

  • NoSQL
  • Terabytes to Petabytes
  • Expensive, more performant to Datastore
  • More analytics oriented
  • Requires node management

BigQuery

  • No management needed
  • Uses SQL as a query language
  • Load in datasets – i.e. not an operational DB. Project -> Dataset -> Table
  • Data warehouse for exploring, analysing data
  • IAM is on the project and dataset level, not table
  • Can list all queries done by users for auditing
  • Running queries in non-billing projects (DataViewer role), and charging it to a particular billing project scenario (BigQuery user role)
  • Partitioning is done to lower query times and costs if we know that we only want data from a certain time period. Partition by ingest time or Timestamp/date column
  • Sharded tables is an alternative where you literally separate tables by date
  • Can expire table data. If tables are not edited it is auto converted to long term storage after 90 days
  • Mountkirk and TerramEarth

MemoryStore

  • Redis as a service
  • Regional in nature

VPC

  • Subnets are Regional. Subnets can span multiple zones in a region
  • VPCs are Project based and can be shared across projects within organization
  • Incoming (ingress) traffic is free. Egress is not
  • IAM: Compute Admin, Compute Network Admin
  • Firewalls: implied deny all ingress, allow all egress
  • Host project = host shared VPC, Service Project = can connect to host’s VPC

Hybrid Connectivity

Cloud Interconnect

  • Physical connection
  • Partner (telco edge location) vs direct interconnect
  • Doesn’t touch public internet, i.e. private network
  • No VPN tunnels
  • 10 Gbps per link, up to 80Gbps max

Cloud VPN

  • IPSEC over public internet
  • 1.5 Gbps per tunnel, up to 8 = ~12 Gbps max
  • Site to site, not site to client
  • Supports IKEv1 and IKEv2
  • Static vs dynamic routing
  • Static, need to manage routing table manually to add all subnet routes
  • Dynamic routes (uses cloud router) uses BGP. Client side’s VPN gateway needs to support BGP for this to work. Auto discovers new subnet routes
  • Dynamic routes BGP may need enabling “Global” dynamic routing setting on the VPC level for both networks. The default is “regional” dynamic routing
  • Both sides need to setup VPN gateways, and both sides need a VPN tunnel to each other

Peering

  • To google, not GCloud, for eg. GSuite
  • Direct vs. Carrier Peering
  • May be a fake option for exam

VPC Network Peering

  • Connect two GCloud VPC (even in different organizations)
  • Still internal private network
  • Less expensive, lower latency

Compute Engine

Disks

  • Persistent DISK (SSD or standard, network attached), Local SSD (attached to VM)
  • Persistent disk is the ONLY bootable option
    • Auto RAID, networked
    • Can detach and move
    • Resize while running
    • Can attach to multiple instances if in read only mode
    • Performance improves as disk size increases
    • Highly reliable
    • Can do encryption at rest. including self managed keys
    • Can use as a file server
    • Upto 65 TB
    • Only accessible within a ZONE. Not cross ZONE inside a region yet
  • Local SSD
    • Can’t be used as boot disk
    • Physically attached, highest performance
    • Must be attached on CREATION of instance
    • Not accessible to other instances in the same zone
    • 375 GB, up to 8 of them (3TB)
    • Less reliable as disk can fail, no data replication
    • Can encrypt but not using own keys
    • Two types SCSI, NVMe (faster)

Images

  • Can be used across zones, regions and projects and shared across projects
  • Create new instances or create instance templates
  • Better to create when instance is shutdown
  • Has deprecation states: deprecated (works with warning), obsolete (no new access), deleted (no access at all), active

Snapshots

  • For backups, can also use to create instances
  • Incremental backup – first snapshot is big, subsequent only contain the diff
  • If you delete a snapshot, the changes get merged to the next existing snapshot
  • Can create while running
  • Project scope but can be shared across projects
  • If we restore a snapshot using gcloud command:
    • First create a disk using the snapshot
    • Then create the instance using the disk name

Startup Scripts

  • Copy and paste on instance creation, or point it to a Storage bucket url that has a script of commands to run: key = startup-script-url; value = gs://bucket-location.sh
  • Using the startup script in bucket lets you change the script on a whim
  • Run as sudo

Pre-emptible VMs

  • 24 hrs max
  • Google can shut down your instance at any time w/ a 30 sec warning
  • Used for fault tolerant BATCH PROCESSING workloads, e.g. rendering
  • Cheaper by 80%

Scaling

Load Balancers

  • L4 vs L7
  • HTTP(S)=L7 vs TCP or UDP=L4
  • 5 Types of LB:
    • HTTP(S) – Global scope, external
    • SSL Proxy – Global scope, external
    • TCP Proxy – Global scope, external
    • Internal TCP/UDP – NLB, regional in scope
    • External TCP/UDP – Network Load balancer, regional in scope
  • Global (HTTP(S), SSL PROXY, TCP Proxy) vs regional (Internal TCP/UDP, Network TCP/UDP)
  • External ( HTTP(S), SSL PROXY, TCP Proxy, Network TCP/UDP) vs internal traffic (Internal TCP/UDP)
  • Firewall rules are not applied to LB, they are applied to whatever is behind the LB
  • Bucket can be the backend target of LB

Instance Groups

  • Manage a group of instances together
  • Managed vs unmanaged
  • Unmanaged, collection of instance that are not identical, no need to focus on this
  • 2 steps to create a MANAGED instance grps Create an Instance Template. Then create an instance group from the instance template.
  • Instance Templates are global and can be reused for multiple groups
  • Instance group is bound to region
  • Managed instance groups are pretty much always paired with LBs
  • There’s a managed instance group updated to migrate to new machines with no downtime
  • Can’t use snapshot to create instance template, but can use images

Autoscaling

  • Health checks need to be enabled in firewall of instances

Compute Options

  • Compute Engine – full control, most administrative work, for lift and shift
  • Kubernetes Engine – containers, patches os for you
  • App Engine – PaaS, HTTP only
  • Cloud Functions – Respond to Events

App Engine

  • Standard Env vs. Flexible Environment
  • Standard has more contraints: Python, Java, PHP, Go, Node
  • Faster scaling, cheaper
  • Flexible has more languages, and can use Docker containers. Scales more slowly, has VPC access which allows for VPN
  • App Engine can make use of memcache to speed up DB queries
  • Memcache has 2 levels, Shared (free, on by default) and Dedicated (pick GB of memory to dedicate)
  • Know that Dedicated memcache can improve SQL query times and app performance
  • App Engine has Version Management, allows for canary testing. Split traffic to V1 40%, V2 60%.
  • Can deploy a version but direct ZERO traffic to it for testing using a ‘–no-promote’ flag

Kubernetes Engine

  • Portability/compatibility, reduce OS, version dependencies
  • Use with microservices
  • Pod = smallest deployable unit. Pods = one of more containers bundled together
  • Node = 1 VM. Multiple pods/containers per node
  • Node pool = group of nodes
  • Cluster = group of node pools
  • gcloud vs kubectl commands. kubectl used for pods on nodes. Gcloud for gcp resources
  • Use alpine linux for dockerfiles, install deps, copy src code (IN THAT ORDER)

Big Data & Machine Learning

  • Cloud Dataproc = Managed Apache Hadoop & Spark. Lift and shift Hadoop & Spark workloads. Mostly used for Hadoop compatibility
  • Cloud Dataflow = batch & streaming data processing but Apache Beam. Serverless
  • Streaming = pub sub. Used for data processing
  • Dataproc (hadoop compatibility) vs Dataflow (preferred)
  • Cloud Dataprep = prepare your data, build on dataflow
  • Cloud Pub/sub = async messaging. Global scope, serverless. Data ingest
  • Machine Learning services
  • Datalab – visualize data (GCP), based on jupyter
  • Data Studio – visualize data (gsuite)

Data Lifecycle

  • Ingest – pubsub
  • Store – DBs. cloud storage
  • Process and analyze – above section
  • Explore and visualize – datalab, data studio, google sheets

Case Studies

Mountkirk Games

  • Want to put new game on GCP
  • Game backend – GCE need custom linux distro (managed instance groups custom images)
  • NoSQL – Datastore
  • Need analytics storage
  • Management cares about:
    • Scaling in case their game takes off
    • measure performance (stackdriver monitoring/logging)
    • No downtime
    • Managed services
    • Analytics – usage patterns
    • establish global footprints – multiple regional instance group backends + global HTTP LB. PubSub (can buffer data), Datastore, BigQuery, Cloud Storage, Dataflow
  • Game Backend – managed instance groups + stackdriver to drive autoscaling
  • Transactional DB service, managed nosql = Cloud Datastore (Firestore)
  • Analytics = Big Query (probably, fully managed, ~10TB data) or BigTable (has admin overhead), pubsub -> dataflow -> bigquery
  • Batch data to Cloud Storage -> dataflow

Dress4Win

  • on prem hosted, need future proof
  • POC to GCP of dev and test
  • setup DR on GCP (hybrid network, VPn or Interconnect)
  • If successful, then full migration to GCP
  • prefer managed services
  • Worry about costs, scaling down during off-peak times
  • Security, customer supplied key, IAM, firewalls
  • Global footprint NOT a priority ATM
  • Want to recreate existing infra in cloud, not redesign their applications
  • Dev/Test should be separate projects
  • Automate infra creation using gcloud, deployment manager
  • Stackdriver monitoring, logging and debug
  • Continuous deployment CI/CD, Cloud Build
  • Replicate Mysql to Cloud SQL -> DNS cutover for DR, single region ok
  • Redis cluster -> Memorystore
  • Managed instance groups allows lift and shift of app servers
  • Apache Hadoop servers -> Cloud Dataproc + Cloud Storage
  • 3 RabbitMQ -> Pub/Sub
  • Storage/SAN -> Persistent disks (block level storage)
  • NAS -> Cloud Storage (potentially persistent disk as well)

TerramEarth

  • Tractors, bulldozers w/ sensors
  • Data collected used to do analytics, tune vehicles, pre-emptive stock replacement parts by detecting likely part failure
  • 20 mil offline tractors, 200k IOT connected tractors. Most data is only accessible if the tractor is brought to a service centre for data upload
  • They want to collect and act on data faster (900TB/day)
  • Global footprint
  • Solution 1: convert everything to IoT
    • Tractors <-> Cloud IoT core <-> pubsub <-> dataflow <–> BigQuery -> Machine Learning to tune tractor params
  • Share dashboards with dealer networks with Data Studio
  • They need multi-regional/global services
  • They need a backup strategy -> BigQuery -> Cloud Storage

Cost Optimization

  • Sustained use discount is automatic depending on how long your instance is up and running (compute and cloud sql). Up to 30%
  • Custom machine types – choose you own CPU and RAM combo
  • Preemptible VMs
  • Nearline and coldline Cloud Storage – same performance but retrieval costs
  • Committed use discounts – 1 or 3 year terms set pool of CPU and RAM

Storage Transfer

  • Import from AWS S3, HTTP/HTTPS, another Cloud Storage Bucket
  • Not applicable to on prem – in this case use gsutil
  • The only destination is a GCP Storage Bucket

Disaster Recovery

  • GCE Instances = disk snapshot (incremental). Done using cronjob or snapshot scheduler
  • Cloud Storage object versioning + lifecycle management. Revert object
  • Application rollback. Compute Engine -> rolling update by applying an old instance group template. Snapshots are irrelevant
  • App Engine has versioning control with traffic % split (canary update)

Security

  • Separate projects for dev, test, prod
  • Principle of least privilege at Organization and Project levels (primarily)
  • Google Cloud Storage: IAM (lower scope is bucket level), ACL and signed URLS (object level)
  • Securing communications: public key infrastructure
  • IOT core uses Message Queuing Telemetry Transport (MQTT)

Network Security

  • Projects (including IAM)
  • VPC
  • Firewall
  • Organization -> Projects -> VPC -> Regions -> Subnets
  • IAM cannot limit access to VPCs within a same project. So if we don’t want someone to have access to a VPC, that VPC must be in a different project
  • Projects separate people from having access
  • VPCs separates resources
  • Firewalls separate access by network traffic by port, tags, service accounts, ip ranges, subnets
  • Firewall applies to resources behind a load balancer. NOT at the load balancer
  • Health checks require firewall rules
  • Bastion hosts can be used to remove a public IP address