Saturday, April 15, 2023

Why is Go fast?

Why is Go fast?

Go has become popular for microprocesses & for scaling. What are the design decisions that make Go fast?

Summary:

1. Clever usage of the stack in preference to the heap

2. Lightweight Goroutines in a process avoiding OS calls to switch between threads & processes

Details:

1. The Stack

In the history of computer science, the fastest way to read memory is sequential.  The stack is a consecutive memory block with fast & simple allocation/deallocation by moving stack pointers & using stack frames. The heap memory is used by pointers to chunks of memory. https://www.ardanlabs.com/blog/2017/05/language-mechanics-on-stacks-and-pointers.html

The heap memory is hard to manage. They have to be either managed by hand as in C/C++ or by a garbage collector that tracks data that is no longer pointed to. Garbage collectors are designed for high throughput (finding the most unused memory in a scan) or the more popular low latency (scan quickly). Go uses a low latency garbage collector. https://research.google/pubs/pub40801/

Sometimes, depending on the code, when the data can't be stored on the stack, the compiler will move the data to a heap, which is called 'escape analysis'. This can get complex, since if data that needs to be moved remains on the stack, this can cause memory corruption. The better the escape analysis, more data can remain on the stack & less data will need to move to the heap, which can improve performance. https://segment.com/blog/allocation-efficiency-in-high-performance-go-services/

Even though RAM is "Random Access memory", the fastest way to read memory is still sequential. Accessing random heap data via pointers even in RAM is two orders of magnitude slower than sequential stack data. https://www.forrestthewoods.com/blog/memory-bandwidth-napkin-math/

In Java, objects are stored on heaps; only its pointer is on the stack. Lists in Java, though they look linear & sequential, are stored as an array of pointers on the stack. The actual data is in the heap. Python, Ruby & Javascript have similar behaviors. Java has clever complex tunable garbage collectors, using both high throughput & low latency GCs in the JVM. The VMs for Python, Ruby & Javascript are less optimized than Java's.

In Go, Structs & primitive types use the Stack. Pointers are discouraged. The Garbage collector is optimized for low latency to return quickly. The design decision of favoring & encouraging the stack results in better performance.

2. Concurrency: Processes vs Threads vs Goroutines

Concurrency has to be used for the right use-cases. Concurrency is not parallelism & can increase code complexity. Amdahl's Law provides a formula to determine if concurrency is useful dependng on the nature of the sequential vs parallel work. Concurrency is best used for slow work such as I/O or network calls rather than most in-memory work. https://www.oreilly.com/library/view/the-art-of/9780596802424/

Traditionally, languages provide concurrency creating threads within a process through OS calls with locking to share data or through multiple processes. The OS schedules threads within a process on a CPU core.

A Go program creates multiple threads in a Go process on launch, paying the cost upfront, with its own scheduler. Go provides Goroutines, which can be thought of as lightweight processes or threads managed by Go rather than the OS. Creating a Goroutine & switching between Goroutines are quick since they happen within the Go process & don't make an OS call. 

Go's scheduler is part of the Go process, making it quick compared to the OS scheduler, automatically balances the workload across the threads & works with the GC. The scheduler is optimized; for example, it can unschedule a goroutine when it is blocking on I/O. https://youtu.be/YHRO5WQGh0k

Goroutine stack sizes are smaller than OS thread stack sizes by default, consuming less memory by default (with an ability to grow as needed, paying a performance penalty at that time). Consequently, Go programs can spawn tens of thousands of simultaneous Goroutines, while a similar approach with native OS threading in other languages will slow to a crawl. 

Go uses the CSP (Communicating Sequential Processes) concurrency model by default, using unbuffered & buffered channels to communicate data. This pattern enhances code clarity but is not a performance improvement; the performance improvement comes from the earlier design decisions.. Mutexes, locks & atomics are available, if needed.

3. Compiler

Compiler builds are large projects can be time consuming. The Go compiler does not support circular dependencies when building code. This adds development burden to organize the packages but results in quicker builds, compared to competing compilers that support circular dependencies.

Conclusion:

Some simple design decisions have made Go a performant language, resulting in high adoption & usage for scalable applications, despite its relative younger age.

Reference: The excellent book "Learning Go, an idiomatic approach to Real-World Go Programming" by Jon Bodner 

Sunday, January 29, 2023

Cloud native architecture-an overview

Any infrastructure has two main components: compute & storage. Software needs compute to run & storage to read/write.

You can have your own local server farm to provide compute & storage. Or use a cloud provider & simply decide how much compute & storage you want, allowing them to manage the server farm. If your needs are variable, you may overprovision or underprovision your local server farm. Or outsource the planning to cloud providers who can optimize with economies of scale. Use cloud elasticity to quickly provision & deprovision as needed.

A cloud has two main requirements. Agility & Resiliency. 

Agility: The ability to keep iterating & changing rapidly. A popular way to achieve it is using Micro-services. Micro-services can allow isolated parts of a system to be updated quickly. Have continuous integration & delivery. 

Resiliency: Mistakes are not completely unavoidable because avoiding errors completely could have a heavy process & slow down agility. Make mistakes cheap. Mean-time between failures (MTBF) used to be a measure. Now, the goal is a quick mean-time to recovery (MTTR). Good isolation can enable a system to not go down even if parts of a system go down. The failure should be restricted to the domain. The design goal is loose coupling & high cohesion. But in practice, this can be hard to achieve. The way to achieve resiliency is observability. Changes to one sub-system shouldn't require changes to another sub-system (loose coupling). Relative functionality should be in the same module (cohesion). With observability, we can have canary hosts such as some production servers to monitor for errors.

Agility & Resiliency are opposing forces & need to be balanced. 

Micro-services are popularly deployed on containers, such as with Kubernetes. Needs a trace-id across micro-services to track calls. Needs metrics & error handling. Containers are light-weight & faster to deploy than VMs, unless the use-case requires VMs. Rarely, bare-metal physical servers would be required. The other extreme is using a compute lambda function as a service on 'server-less' services where all infrastructure is taken care of my the provider. Similarly, for storage, you can manage your own database or use a 'server-less' cloud fully managed database.

Scaling can be scaling-out (multiple copies of the app with a load balancer) or scaling by functionality (functional decomposition including the data), also called scaling vertically or scaling by data partitioning (vertically such as through sharding or storing columns separately; or horizontally by dividing the data by say, a name).

Cloud services can be Infrastructure-as-a-service (IaaS: provide only the compute/storage), Platform-as-a-service (PaaS: provide some software on top of compute/storage) or Function-as-a-service (FaaS-Serverless lambda functions: provide a full software stack on top compute/storage).

Co-authored with Srinivasan Varadharajan

Sunday, July 11, 2021

Cybersecurity overview

Secrecy: Also called confidentiality. Only authorized people should be access sensitive data. Eg: data breach revealing credit card info.

Integrity: Only authorized people should be able to modify data. Eg: Hackers who have your password & impersonate as you in sending emails.

Availability: Authorized people should always have access to their systems & data. Eg: DoS, DDoS.

Defense:

Threat Model: Expected attack vector: Capabilities, goals & means of attack of the expected attacker. Defend against specific threats rather than an amorphous generic security that is not defined.

Who are you? Authentication. What you know (eg: password, PIN, secret; defense: making is more complex to avoid brute force), what you have (requiring a physical key) or what you are (fingerprint/iris scanner). Two-factor or multi-factor authentication reduces risk.

What can you access? Authorization. Access Control Lists (ACL) can determine access. Eg: US DoD's Bell-Lapuda model: No read-up, No write-down (Secret access can't access Top Secret; Top Secret can't update Secret files), Chinese Wall model, Biba model..

Past access for auditing: Accounting

This is called AAA.

Cryptography:secret writing.

Encryption/Decryption.

Substitution Ciphers. Eg: Caesar Cipher: shift every letter by 3. Simple ciphers can be decrypted by Cryptanalysts. In 1587, Mary, the Queen of Scots' assassination plot of Queen Elizabeth cipher was cracked, leading to her execution.

Permutation Ciphers. Eg: Columnar Transposition Cipher: Ordering direction & grid size is the key. The famous German Enigma Cipher was cracked by Alan Turing's machine during WW-II.

Software Encryption.

Data Encryption Standard (DES): Developed by IBM & NSA in 1977. 56 bits. But able to be cracked by increase in computing power.

Advanced Encryption Standard (AES): Published in 2001. 128, 192 or 256 bits. Chops data into 16 bit chunks & applies substitution & permutation on them based off a key for 10 or more times (not more for performance reasons).

Mathematical one-way functions for symmetric key exchanges. Eg: Caeser Cipher, Enigma, AES, Diffie Hellman Key Exchange using modular exponentiation. (b^y mod m)^x = (b^x mod m)^y = b^xy mod m. 

Asymmetric key exchanges with a public/private key. Invented by RSA (Rivest, Shamir, Adleman).

Data can be in-use, at rest or in-motion. Data loss prevention (DLP) monitors, detects & blocks sensitive data at any of these points.



Networking layers summary

OSI: Open Systems Interconnection created by ISO

Acronyms:
All People Seem To Need Data Processing (7 to 1)

Please Do Not Touch Samy's Pet Alligator (1 to 7)

Layers:

  1. Physical (Binary):  Cable, Radio frequency, voltages, pins, electric signals
  2. Data (Frame): MAC (Media Access Control) to id device, Logical Link Control (LLC). Eg: PPP, HDLC, ATM, Frame Relay, SLIP and Ethernet. Ethernet with exponential backoff (exponential time + random time to retry if network congestion). Switches operate at this layer to group computers & reduce congestion.
  3. Network (Packet): IP at this layer. An IP Packet has the IP header & payload. Packet forwarding, routing (to decide the best path), routers with ICMP (Internet Control Message Protocol), BGP (Border Gateway Protocol). Network hops counted. High hops indicate issues. Hop limits set. traceroute can help figure out the route.
  4. Transport (Segment): TCP, UDP, SPX on top of IP. UDP: User Datagram Protocol: simple header with port & checksum, no mechanism to retry. TCP: sequence number of packets in headers with ACKs (that doubles the network traffic) to ensure receipt.
  5. Session (Data): Session, timeouts. Eg: NFS, SQL, RDBMS, ASP, SIP.
  6. Presentation (Data): Eg: jpg, gif, ascii, ansi, utf8.
  7. Application (Data): Eg: http, https, smtp, snmp, ftp, dns, browsers, Skype, Outlook. DNS: Tree structure with Top Level Domains (eg: .com), Second Level Domains (eg: google.com) & Sub-Domain of Parent (eg: drive.google.com), distributed across many trusted servers.

Thursday, December 24, 2020

Microservices deployment

History of deployment options:

  • Physical machines: 1990s. Fast deployment, best performance. Configuring/reconfiguring cumbersome.
  • Virtual machines (VMs): 2000s. AWS EC2 released in 2006. AWS Elastic Beanstalk is an easy way to deploy. Can create a base image & add new instances. But virtualizing entire VM adds overhead.
  • Containers: 2013-initial Docker release (competitor: Solaris Zones). Containers virtualize only OS. Quicker. Need to administer container orchestration solution (eg: Kubernetes or Docker Swarm) or go with hosted solution like Google Container Engine or AWS ECS. Sample load balancer: AWS Elastic Load Balancer (ELB).
  • 'Serverless': 2014-AWS Lambda. Managing OS security patches also abstracted out. Competitors: Google Cloud with functions, Microsoft Azure with functions. Open source: Apache Openwhisk & Fission for Kubernetes. Underlying server infrastructure is hidden & abstracted away from specific programming languages. Usage based pricing. But can take time to start up & service the 1st request (long-tail latency) & not designed for long-running services.
Docker:
  • Dockerfile
  • Push to registry
Kubernetes:
  • Cluster resource management: Cluster of machines as pool of CPU, memory, storage.
  • Scheduling & service management
Kubernetes Architecture:
  • API server: REST API
  • Etcd: NoSQL db
  • Scheduler
  • Controller manager
Kubernetes Node:
  • Kubelet: creates/manages pods on node
  • Kube-proxy: networking, load balancing
  • Pods: App services
Kubernetes Concepts:
  • Pod: Single container or sidecar containers that implement supporting functions.
  • Deployment: # of instances, versioning with rolling upgrades & rollbacks called 'zero-runtime'.
  • Service: IP, DNS, load balancing
  • ConfigMap: External config, allows storing passwords as a 'Secret'.
Source: Microservices Patterns by Chris Richardson

Microservices security & tracking

Security:

  • AAA: Authentication, Authorization, Accounting/Auditing
  • Secure interprocess communication (TLS)
Security frameworks:
  • PassportJS: NodeJS security framework on Authentication
  • Spring Security/Apache Shiro: Java frameworks for Authentication/Authorization
Authentication security context:
  • In-memory: Can be used within the same process.
  • Centralized session: Session stored externally such as in a database. Eg: API token for use with an API gateway with an Authentication service.
Authorization:
  • Opaque tokens such as UUIDs. Reduce performance, availability & increase latency.
  • Transparent token. Eg: JWT: JSON Web Token is a popular standard. Since self-contained, irrevocable, hence needs short expiration times & reissuals.
  • OAuth 2.0: Has an Authorization Server for an access token & refresh token. Eg framework: Spring OAuth internally using JWTs.
Externalized configuration:
  • Push model: Push config props to service. Eg: Spring Boot.
  • Pull model: Service reads from config server. Eg: Databases, version control systems or configuration servers.

Storing sensitive data with credentials using configuration servers:
Centralized config, transparent decryption, dynamic reconfig.
  • Hashicorp Vault
  • AWS Parameter Store
  • Spring Cloud Config Server
Observing & Tracking:
  • Health check API
  • Log aggregation: Centralized logging system such as ELK (ElasticSearch, Logstash, Kibana), Fluentd, Apache Flume, AWS CloudWatch.
  • Distributed tracing: trace id that flows between services. Common standard for trace id: Zipkin B3 propagation standard. Aspect Oriented Programming libraries that auto-log such as Spring Cloud Sleuth. Distributed tracing servers such as Twitter's Zipkin (using a database supporting http or a message broker) or AWS X-ray.
  • Exception tracking. Eg: Exception tracking services such as Honeybadger (cloud-based), Sentry.io (open-source & deploy in-house).
  • Application metrics: Eg: Micrometer Metrics for collection. AWS Cloudwatch metrics is a push model service. Prometheus (open-source) is a pull model service with data visualization tool: Grafana.
  • Audit logging
Robustness:
  • Handle failure with network timeouts, limit requests & a circuit breaker (fail all requests if many requests start failing).
  • Frameworks: Netflix Hystrix (JVM), Polly (.NET).
Chassis/Mesh:
  • Microservice chassis: Framework or set of frameworks to address common requirements. Eg: Spring Boot, Spring Cloud, Go Kit. But language specific.
  • Service mesh: Networking infrastructure mediator that simplifies Chassis. Eg: Linkerd, Istio, Conduit.
Istio Service Mesh features:
  • Traffic management: Service discovery, load balancing, routing rules, circuit breakers.
  • Security: TLS
  • Telemetry: Network traffic metrics, distributed tracing
  • Policy enforcement: quotas & rate limits
Service Mesh Control Plane:
  • Pilot: Configures Envoy proxies & data plane based off deployed services. Envoy proxy is performant & supports multiple protocols (tcp, http, https, MongoDB, Redis, DynamoDB), TLS & other interservice features like auto-retires, rate limiting & circuit breakers. Envoy is a sidecar container within the service's pod.
  • Mixer: Collects telemetry from Envoy proxies & enforces policies.

Source: Microservices Patterns by Chris Richardson

Isolation & Locks

The CAP theorem states that two out three of Consistency, Availability & Partition Tolerance may be achieved.

RDBMS systems allow for ACID: Atomicity (support for all or non transactions), Consistency (Referential integrity handled by local dbs), Isolation (concurrent/sequential won't matter) & Durability (handled by local dbs).

Issues without Isolation:

  • Lost updates (Update without realizing a prior update)
  • Dirty reads (reading before a prior operation has fully succeeded)
  • Nonrepeatable/fuzzy reads (Subsequent reads in the same operation returns different data)
Locking strategies:
  • Semantic lock: app level lock
  • Commutative updates: Update executable in any order
  • Pessimistic lock/view: Reorder steps
  • Reread value: Avoid dirty writes by reading data prior to write
  • Version file: Record updates
  • By value: Each request chooses concurrency mechanism as required

Source: Microservices Patterns by Chris Richardson

Why is Go fast?

Why is Go fast? Go has become popular for microprocesses & for scaling. What are the design decisions that make Go fast? Summary: 1. Cle...