Thursday, December 24, 2020

Microservices deployment

History of deployment options:

  • Physical machines: 1990s. Fast deployment, best performance. Configuring/reconfiguring cumbersome.
  • Virtual machines (VMs): 2000s. AWS EC2 released in 2006. AWS Elastic Beanstalk is an easy way to deploy. Can create a base image & add new instances. But virtualizing entire VM adds overhead.
  • Containers: 2013-initial Docker release (competitor: Solaris Zones). Containers virtualize only OS. Quicker. Need to administer container orchestration solution (eg: Kubernetes or Docker Swarm) or go with hosted solution like Google Container Engine or AWS ECS. Sample load balancer: AWS Elastic Load Balancer (ELB).
  • 'Serverless': 2014-AWS Lambda. Managing OS security patches also abstracted out. Competitors: Google Cloud with functions, Microsoft Azure with functions. Open source: Apache Openwhisk & Fission for Kubernetes. Underlying server infrastructure is hidden & abstracted away from specific programming languages. Usage based pricing. But can take time to start up & service the 1st request (long-tail latency) & not designed for long-running services.
Docker:
  • Dockerfile
  • Push to registry
Kubernetes:
  • Cluster resource management: Cluster of machines as pool of CPU, memory, storage.
  • Scheduling & service management
Kubernetes Architecture:
  • API server: REST API
  • Etcd: NoSQL db
  • Scheduler
  • Controller manager
Kubernetes Node:
  • Kubelet: creates/manages pods on node
  • Kube-proxy: networking, load balancing
  • Pods: App services
Kubernetes Concepts:
  • Pod: Single container or sidecar containers that implement supporting functions.
  • Deployment: # of instances, versioning with rolling upgrades & rollbacks called 'zero-runtime'.
  • Service: IP, DNS, load balancing
  • ConfigMap: External config, allows storing passwords as a 'Secret'.
Source: Microservices Patterns by Chris Richardson

Microservices security & tracking

Security:

  • AAA: Authentication, Authorization, Accounting/Auditing
  • Secure interprocess communication (TLS)
Security frameworks:
  • PassportJS: NodeJS security framework on Authentication
  • Spring Security/Apache Shiro: Java frameworks for Authentication/Authorization
Authentication security context:
  • In-memory: Can be used within the same process.
  • Centralized session: Session stored externally such as in a database. Eg: API token for use with an API gateway with an Authentication service.
Authorization:
  • Opaque tokens such as UUIDs. Reduce performance, availability & increase latency.
  • Transparent token. Eg: JWT: JSON Web Token is a popular standard. Since self-contained, irrevocable, hence needs short expiration times & reissuals.
  • OAuth 2.0: Has an Authorization Server for an access token & refresh token. Eg framework: Spring OAuth internally using JWTs.
Externalized configuration:
  • Push model: Push config props to service. Eg: Spring Boot.
  • Pull model: Service reads from config server. Eg: Databases, version control systems or configuration servers.

Storing sensitive data with credentials using configuration servers:
Centralized config, transparent decryption, dynamic reconfig.
  • Hashicorp Vault
  • AWS Parameter Store
  • Spring Cloud Config Server
Observing & Tracking:
  • Health check API
  • Log aggregation: Centralized logging system such as ELK (ElasticSearch, Logstash, Kibana), Fluentd, Apache Flume, AWS CloudWatch.
  • Distributed tracing: trace id that flows between services. Common standard for trace id: Zipkin B3 propagation standard. Aspect Oriented Programming libraries that auto-log such as Spring Cloud Sleuth. Distributed tracing servers such as Twitter's Zipkin (using a database supporting http or a message broker) or AWS X-ray.
  • Exception tracking. Eg: Exception tracking services such as Honeybadger (cloud-based), Sentry.io (open-source & deploy in-house).
  • Application metrics: Eg: Micrometer Metrics for collection. AWS Cloudwatch metrics is a push model service. Prometheus (open-source) is a pull model service with data visualization tool: Grafana.
  • Audit logging
Robustness:
  • Handle failure with network timeouts, limit requests & a circuit breaker (fail all requests if many requests start failing).
  • Frameworks: Netflix Hystrix (JVM), Polly (.NET).
Chassis/Mesh:
  • Microservice chassis: Framework or set of frameworks to address common requirements. Eg: Spring Boot, Spring Cloud, Go Kit. But language specific.
  • Service mesh: Networking infrastructure mediator that simplifies Chassis. Eg: Linkerd, Istio, Conduit.
Istio Service Mesh features:
  • Traffic management: Service discovery, load balancing, routing rules, circuit breakers.
  • Security: TLS
  • Telemetry: Network traffic metrics, distributed tracing
  • Policy enforcement: quotas & rate limits
Service Mesh Control Plane:
  • Pilot: Configures Envoy proxies & data plane based off deployed services. Envoy proxy is performant & supports multiple protocols (tcp, http, https, MongoDB, Redis, DynamoDB), TLS & other interservice features like auto-retires, rate limiting & circuit breakers. Envoy is a sidecar container within the service's pod.
  • Mixer: Collects telemetry from Envoy proxies & enforces policies.

Source: Microservices Patterns by Chris Richardson

Isolation & Locks

The CAP theorem states that two out three of Consistency, Availability & Partition Tolerance may be achieved.

RDBMS systems allow for ACID: Atomicity (support for all or non transactions), Consistency (Referential integrity handled by local dbs), Isolation (concurrent/sequential won't matter) & Durability (handled by local dbs).

Issues without Isolation:

  • Lost updates (Update without realizing a prior update)
  • Dirty reads (reading before a prior operation has fully succeeded)
  • Nonrepeatable/fuzzy reads (Subsequent reads in the same operation returns different data)
Locking strategies:
  • Semantic lock: app level lock
  • Commutative updates: Update executable in any order
  • Pessimistic lock/view: Reorder steps
  • Reread value: Avoid dirty writes by reading data prior to write
  • Version file: Record updates
  • By value: Each request chooses concurrency mechanism as required

Source: Microservices Patterns by Chris Richardson

Monday, December 21, 2020

Messaging architecture

Message formats:

  • Text, such as JSON/XML. Readable & easier for debugging.
  • Binary. Eg: Protocol buffers (with self-defined tagged fields), Avro (requiring consumer to know the schema) & Thrift.

Message protocols:

  • http. Eg: REST, SOAP. REST requires mapping calls to REST verbs.
  • IPC, such as gRPC, a client-server framework using binary Protocol buffers.

Service discovery:
  • Self registration with a service registry.
  • Client service discovery. eg: Netflix Eureka: a HA service registry, Eureka client & Ribbon, a http client. Pivotal Spring Cloud, a Spring Java client that works with Eureka.
Message types:
  • Document
  • Command
  • Event
Message channels:
  • Point-to-Point channel: 1-1 interaction.
  • Pub-Sub (Publish-Subscribe): 1:many interactions.
Message notifications:
  • One-way notification: no reply required.
  • Pub-Sub: Async responses.
Message broker?
  • Brokerless messaging: Peer-to-peer (P2P) like ZeroMQ. Simple, performant, guaranteed delivery complex, reduced availability.
  • Message broker: Helps two services communicate. Popular.
Message broker:
  • Standards: AMQP, STOMP.
  • Open source: ActiveMQ (queues/topics), RabbitMQ (exchanges/queues), Apache Kafka (Topics).
  • Cloud: AWS Kinesis (streams), AWS SQS (queues). Uses sharding to improve availability.
Database for idempotency:
Another abstraction layer. Choose a database & store unique messages based off an id (say message id) to implement idempotency. Push to message broker from db.
  • Debezium / Eventuate Tram: Db to Kafka.
  • LinkedIn Databus: Oracle transaction log published as events.
  • DynamoDB streams: Time-ordered DynamoDB log & publish as events.
Source: Microservices Patterns by Chris Richardson

Saturday, December 12, 2020

Microservices API Gateway

Benefits: Instead of specific services, clients talk to the API gateway, which provides a client-specific API.

Drawbacks: Needs to be HA & managed. New services & APIs needs updating the gateway.

Off the shelf:

  • Commercial:
    • AWS API Gateway: Configure each request to a BE service that is an AWS Lambda function, an app http service or AWS service.
    • AWS App load balancer: Basic routing to BE services. 
  • Open-source product:
    • Kong: nginx http based. Has plugins.
    • Traefix: Go based. Integrate with service registries.
  • Open-source framework:
    • Netflix Zuul
    • Spring Cloud Gateway
  • Graph based technologies:
    • Netflix Falcor: A Graph query implementation. Netflix started with Groovy API scripts in a monolithic architecture. Moved to an API modules using NodeJS & Docker, where scripts invoke an API gateway using Netflix Falcor. Falcor does declarative dynamic API composition & can invoke multiple services in a single request.
    • Facebook GraphQL: a standard with clients & servers available in multiple languages, such as NodeJS, Java & Scala.
    • Apollo GraphQL: JS/NodeJS implementation with useful extensions to GraphQL with a server & client.
Source: Microservices Patterns by Chris Richardson

Wednesday, December 2, 2020

Big Data architecture technology family tree

Big Data Analytics Catalog 

  • Integration
    • Messaging
      • Data Collector
        • Apache Flume
        • Logstash
        • Fluentd
      • Distributed Message Broker
        • Apache Kafka
        • RabbitMQ
        • Amazon SQS
        • Apache ActiveMQ
    • ETL/ELT
      • ETL/Data Integration Engine
        • StreamSets
        • Talend
        • Informatica
  • Data Storage
    • Distributed File System
      • HDFS
      • CassandraFS
    • NoSQL Database
      • Key-Value
        • Riak
        • Redis
        • Berkeley DB
      • Document-Oriented
        • MongoDB
        • CouchDB
      • Column-Family
        • HBase
        • Cassandra
      • Graph-Oriented
        • Neo4J
        • OrientDB
    • Analytic RDBMS
      • MPP Analytic RDBMS
        • HP Vertica
        • Teradata
        • Microsoft SQL Server Parallel Data Warehouse (MS PDW)
        • Amazon Redshift
      • Traditional Analytic RDBMS
        • MS SQL Server
        • Oracle RDBMS
        • IBM DB2
  • Processing & Analytics
    • Visualization & Reporting
      • BI Platform
        • QlikView
        • Microstrategy
        • Tableau
        • Tibco JasperSoft
        • Pentaho
      • Interactive Dashboard
        • Splunk
        • Kibana
        • Zoomdata
      • Graphic Library
        • D3.js
        • GoJS
        • Highcharts
    • Search & Query
      • Interactive Query Engine
        • Impala
        • Apache Hive (Stinger)
        • Spark SQL
      • Distributed Search Engine
        • Splunk
        • Elasticsearch
        • Apache Solr
    • Processing
      • Distributed Computing Engine
        • Hadoop MapReduce
        • Apache Spark
        • Apache Tez
      • Event Stream Processor
        • Apache Storm
        • Spark Streaming
        • Apache Samza
        • Amazon Kinesis
      • Data Processing Framework
        • Cascading
        • Apache Crunch
        • Apache Hive
        • Amazon Pig

Reference:

Designing Software Architectures: A Practical Approach by Humberto Cervantes & Rick Kazman

Why is Go fast?

Why is Go fast? Go has become popular for microprocesses & for scaling. What are the design decisions that make Go fast? Summary: 1. Cle...