Mutex vs Semaphore

Mutex

In literal sense, mutex as the name suggests means mutual exclusion. It is used to take a lock on shared data such as an array or a variable etc.
At any given time, a mutex only allows a single thread to access a resource.

Semaphore

On the other hand, a semaphore is used to give access to a collection of resources to multiple threads. Think of semaphores as the number of parking spots in a parking lot. If the parking lot is full, then a thread requesting a parking spot (access to a resource) remains blocked until an earlier thread takes out the vehicle and empties a parking spot (releases access to a resource).

A semaphore with a single parking spot(single access pass) is called a binary semaphore and is often confused with a mutex. Semaphore can also be used to signal other threads that “Hey I’m leaving now, you can take up the space”. A mutex on the other hand leaves the parking spot without letting the other person know, and only ensures that one parking spot (resource) is accessed by one thread (himself) at any given time.

Semaphore vs Mutex

A semaphore can act as a mutex when it has only one pass to give. (also called binary semaphore). There is a great misconception that a binary semaphore is same as mutex, but
Strictly speaking, a mutex is a locking mechanism used to provide exclusive access to a resource. It implies that only the owner who has the mutex can release the mutex and only then the next guy is allowed to use the resource, or simply put there is ownership given to the thread over the resource via a mutex.

While on the other hand, a semaphore is a kind of signalling mechanism. (“Okay. I’m done, you may carry on now like a waiting queue outside a toilet”)

Summary

  1. Mutex means mutual exclusion and gives exclusive access to a single thread on a resource, whereas semaphore can also be used as mutex along with providing feature of signalling amongst threads.
  2. The thread with mutex has ownership over the resource while there is no concept of ownership in semaphore.
  3. Mutex if locked, has to be unlocked by the same thread. A semaphore however can be acted upon by different threads.

SOAP vs REST

One of the most debatable topics when building communication channel between service components is whether to use SOAP or REST. Let’s first understand what each one of them are:

SOAP (or Simple Object Access Protocol) is a protocol that was designed to ensure that applications built on different platforms and in different programming languages can exchange data easily.

REST on the other hand is an architectural pattern. A service component that follows REST principles is called a Restful service. A Restful service uses the normal HTTP verbs of GET, POST, PUT and DELETE when working with the required components.

Since REST is an architecture, REST defines following architectural constraints:

  1. Uniform Interface
    A resource in the REST world, should have only one logical URI (or uniform resource identifier) that should provide a way to fetch information related to the resource. The resource URI should follow a consistent pattern, so that once a client becomes familiar with APIs of one the resource, it should be able to use similar approach for other APIs as well.
  2. Client-Server
    In the REST world, it essentially means that the client and server can scale/evolve independently without dependency on each other as long as the API interface is kept same.
  3. Statelessness
    RESTful services are stateless meaning server will not store anything about the last HTTP request client made. It will treat each request as new. When interacting with a RESTful service, a client has to be responsible for managing the state of the application.
  4. Cacheable
    In REST, caching can be applied to resources whenever neccessary, and the resources have to declare themselves as cacheable. Caching helps bringing down the number of trips made to server bringing performance improvement to the client.
  5. Layered System
    REST allows us to have a layered architecture. For instance the APIs can be served from Service A, authentication done from Service B, and data stored in Service C. This way REST provides a facade of Service A to client, abstracting the client from downstream dependencies.

When to use REST over SOAP?

REST stands outs over SOAP in following cases:

  • Limited bandwidth
    SOAP messages being heavier in content, requires a lot o bandwidth and should be avoided, if network bandwidth is a constraint, for example: an application on a mobile that is using a 2G/3G network will experience higher latency and slow performance when interacting with Server via SOAP protocol. In such cases, REST should be preferred.
  • Statelessness
    If there is no requirement to maintain state from one request to another, REST should be used.
  • Caching
    If there is a need to cache a lot of requests, REST comes handy. This is because a SOAP request is sent via HTTP POST when using HTTP protocol. Since POST requests are non idempotent in general, it is not cached at the HTTP level. REST can be cached provided that the requests are idempotent (GET, PUT, DELETE). POST requests still won’t be cached.

When to use SOAP over REST?

SOAP should be used in following cases:

  1. Stateful operations
    SOAP has built-in stateful operations. REST is naturally stateless, but SOAP is designed to support conversational state management.
  2. Formal Contracts
    SOAP is good for applications that require formal contracts between the API and consumer since it can enforce the use of formal contracts by using WSDL (Web Services Description Language).
  3. Async processing and reliability
    If there is a requirement that the client needs a guaranteed level of reliability and security then the new SOAP standard of SOAP 1.2 provides a lot of additional features, especially when it comes to security.

At the end of the day, there is no one clear winner but the best protocol is the one that fits the requirements, supports the the types of clients that are needed, and what is needed in terms of flexibility.

Monolithic/ SOA / Microservices

These days there is a lot of discussion about microservices in workplace. Other architectural designs are Monoliths or Service Oriented Architecture. But as an architect when designing new system what would you do? Build microservices without giving a second thought since it’s hot in the market or sit with your tech/business team to evaluate what works best for them in the current time. For this lets first understand what these three are.

Monolithic literally means “formed of a large block/container”. A monolithic system is one big container wherein all the software components of an application are combined and packed tightly.

A Service Oriented Architecture (or SOA) is basically a group of services that communicate with each other to achieve a goal. A SOA system typically has an event bus (or messaging middleware) for components to interact. This event bus is critical as all systems communicate through it and can become a single point of failure.

Microservices (or MSA) are a collection of even small services modelled around a business domain. These are services with single purpose which is mainly business driven. Microservices typically have an API layer for communication whereas a service component in SOA communicates using messaging middleware.

 

monolith vs soa vs microservices
Image: Dependencies from a developer’s perspective

On a higher level, Monolithic system are like huge mess of chunk, tightly coupled (like noodles). A small change in view might require a deployment in the whole system which could have unanticipated affects on the backend.

A SOA however is coarse grained like a pizza (looser coupling than noodles). In SOA each service component may still have multiple responsibilities. Components in SOA must still co-ordinate with each other to make the system work holistically.

A Microservices architecture however has very fine grained service components each with limited tenets or restricted responsibilities in a single domain. These are even more  decoupled than SOA components, and making changes in one MSA component is typically independent of other components.

Should one build Monolith at all?

It’s probably a good idea to go with Monolith when starting out a system, if the domain requirements are simple and team size is small. But even doing so, try to modularize the code (use Modules if using Java9) so that when domain requirements become complex, the same system can be easily split up into smaller service components.

SOA vs Microservices

Below are some key differences between SOA and Microservices architecture.

Service Oriented Architecture Microservices
Service components can range in size from small to large enterprise services Single purpose services
Maximizes component sharing Minimizes sharing through bounded context
Has messaging middleware for communication between components Has API layer for communication between services
Support multiple messaging protocols like AMQP, MSMQ and SOAP Uses lightweight protocols such as HTTP/REST for communication
Heterogeneous protocols used across messaging middleware Homogeneous protocol followed in MSA
Focuses on reusing services as much as possible Focuses on decoupling service components

Though it’s clearly visible that MSA provides more flexibility by having loose coupling, allowing CI/CD, and a homogeneous messaging protocol, it could happen that a SOA might be more suitable for a system with the requirements at hand. For example: if we know that we would have to communicate to different legacy systems with different protocols, we could chose SOA over MSA due to its common messaging middleware.

The point here is that MSA is not always the right choice, and one shouldn’t blindly go for MSA from Day1. It could cost a fortune to run those independent microservices when there is little or sparse traffic. One should rather evaluate the current domain requirements, team size and expertise before choosing one over other.

Distributed Systems 101

What is a distributed system?

In layman terms, a distributed system is a group of computers working together as to appear as a single computer to the end-user. These machines have a shared state, operate concurrently and can fail independently without affecting the whole system’s uptime.

Need for distributed systems

Suppose if we’ve a database server on a single machine, and we’re about to receive more traffic on the machine, we need to scale our server. We can either scale horizontally or vertically. Scaling vertically is well and good while you can, but after a certain point you will see that even the best hardware is not sufficient for enough traffic, not to mention impractical to host.

What a distributed system enables us to do is to scale horizontally. Scaling horizontally simply means adding more machines/computers rather than upgrading the hardware of a single machine.

Horizontal scaling is cheaper than vertical scaling after a certain threshold but that is not its main case for preference. While vertical scaling can only boost performance up to the latest hardware capabilities, the best thing about horizontal scaling is that there is no cap to scale. Whenever performance degrades or you want to support more traffic, you just add more machines and can scale infinitely.

A single high performing machine can become a single point of failure, but a distributed system (which comprises of multiple nodes) provides fault tolerance. A cluster of 10 machines across multiple data centers is more fault tolerant than a single machine. Even is one machine/data-center is down, the other machines/datacenter keep the system up and running.

If we host our application/service on a single host, the roundtrip time for a request to complete from different part of the world could get really high. However with distributed systems, we could keep a node closest to our users in different parts of the world, and when a user from NewYork makes a request to the server, the request is served by the nearest node/data center to NewYork. This way distributed systems also provide overall lower latency to the users.

Overall distributed systems enables us following things:

  1. Horizontal scaling
  2. Fault tolerance
  3. Lower Latencies

 

Database Partitioning and CAP theorem

As the database grows in size, maintaining acceptable performance becomes more complex. Partitioning is one method used to manage large and busy databases.

So what is database partitioning?

Simply put in layman terms, it is splitting data across multiple database servers

  • It is done by a consistent method so you always know where the data is
  • There can be several different methods of partitioning, for instance:
    • ranges A-L, M-Q, R-Z.
    • lists: building a database of books can be categorised in textbooks, sci-fi, cookbooks
    • hashes: function returning a value to determine membership

Why partition databases?

  • Storage limitations: entire dataset may not fit on one server
  • Performance: splitting loads between partitions can make faster writes
  • Availability: ensures you can get the data even if one server gets busy

When not to partition?

  • when your dataset is small

Partitioning in relational databases vs NoSQL

  • Relational databases can be partitioned horizontally or vertically
  • Horizontal partitioning (also known as sharding) puts different rows on different partitions
  • Vertical partitions puts different columns on different partitions

Partitioning non relational databases depend on the database type

  • Key-value and document databases are typically partitioned horizontally
  • Tabular databases can be horizontally and vertically partitioned
Understanding how partitioning works is key to making informed decisions about database performance.

What is CAP theorem?

Finding the ideal database for your application is largely a choice between trade-offs.
The CAP theorem is one concept that can help you understand the trade-offs between different databases.
  • originally conceptualised around network shared data
  • often used to generalize tradeoffs between different databases

CAP theorem states that it is impossible for a distributed software system to simultaneously provide more than two out of three of the following guarantees (CAP): Consistency, Availability, and Partition tolerance. When we design a distributed system, trading off among CAP is almost the first thing we want to consider. CAP theorem says while designing a distributed system we can pick only two of the following three options:

Consistency: All nodes see the same data at the same time. Consistency is achieved by updating several nodes before allowing further reads.

Availability: Every request gets a response on success/failure. Availability is achieved by replicating the data across different servers.

Partition tolerance: The system continues to work despite message loss or partial failure. A system that is partition-tolerant can sustain any amount of network failure that doesn’t result in a failure of the entire network. Data is sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages.

visualization-of-cap-theorem

CAP theorem and SQL vs NoSQL

  • Relational databases tend towards consistency and availability. Partition tolerance is something that relational databases typically don’t handle very well.
  • NoSQL databases trend towards partition tolerance. They are designed with the idea in mind that you’re going to be adding more nodes to your database as it grows.

Does CAP matter to you?

In some instances CAP theorem may not apply to your application.
  • Depending on the size of your application, CAP tradeoffs may be irrelevant
  • small or low traffic website may not require partitions
  • consistency tradeoffs may not be usable in some cases. for instance: votes on a comment may not show right away for all users
CAP theorem can be used a guide for categorising the tradeoffs between different databases.
CAP theorem can help you prioritize on what properties you may want in your database.

Introduction to NoSQL database

One of the most fundamental choices to make when developing an application is whether to use a SQL or NoSQL database to store the data. Conventional relational databases are the product of decades of technology evolution, good practice, and real-world stress testing. They are designed for reliable transactions and ad hoc queries, the staples of line of business applications. But they also come burdened with restrictions—such as rigid schema—that make them less suitable for other kinds of apps.

NoSQL databases arose in response to those limitations. NoSQL systems store and manage data in ways that allow for high operational speed and great flexibility on the part of the developers. Unlike SQL databases, many NoSQL databases can be scaled horizontally across hundreds or thousands of servers.

So what is NoSQL?

In essence, NoSQL database is a database that doesn’t use SQL. SQL was designed to be a query language for relational databases and relational databases are usually table-based, much like what you see in a spreadsheet.

In a relational  database, records are stored in rows and then the columns represent fields in each row. Then SQL allows you to query within and between tables in that relational database.

On the other hand, NoSQL databases are more flexible.
  • Many NoSQL databases allow you to define fields as you create a record.
  • Nested values are common in NoSQL databases. (you can have hashes and arrays and objects and then nest more objects, arrays and hashes within those In that way you can have nested values in your NoSQL databases records)
  • Also fields are not standardized between records. You can have a different structure for every record in your NoSQL database.
NoSQL databases have tradeoffs and limitations
  • won’t solve website scalability issues out of the box
  • benefits and drawbacks must be weighed
  • offer flexibility unavailable to relational databases

Different types of NoSQL databases

Document store

  • Documents are usually stored in a structured format (XML, JSON etc)
  • Usually organized into “collections” or “databases”
  • Individual documents can have unique structures
  • Each document usually has a specific key. This allows to retrieve the document quickly and normally its possible to query a document by specific fields contained inside
  • It is possible to query a document by fields
  • eg: CouchDB, MongoDB, DocumentDB

Key-value store

  • One can query by the key to get the value
  • Some-key value stores let you define more than one key
  • Sometimes used alongside relational databases for caching and session management in web applications
  • Drawback: usually can’t query by anything other than the key
  • eg: MemcacheDB, Riak, Redis, BerkeleyDB, Aerospike, Amazon DynamoDB, Azure Table storage

BigTable/tabular/wide-coulmn stores

  • Named after Google’s propietary “BigTable” implementation
  • Each row can have a different set of columns
  • Designed for large number of columns
  • Rows are versioned
  • eg: Google BigTable, Cassandra, HBase, Amazon SimpleDB

Graph database

  • designed for data represented as interconnected nodes
  • example: a series of road intersections
  • eg: IBM Graph, Neo4J, Titan

Object database

  • Tightly integrated with OOP language
  • Act as a persistence layer: store objects directly
  • Link objects together through pointers
  • eg: VersantDB, Objectivity, Gemstone

The basic NoSQL database classifications are only guides. Over time, vendors have mixed and matched elements from different NoSQL database family trees to achieve more generally useful systems. Cassandra has combined key-value elements with a wide-column store and a graph database. Sometimes NoSQL elements are mixed with SQL elements, creating a variety of databases that are referred to as multimodel databases

Python Workshop

I am quite excited to conduct workshop on python on March 22nd on my Institute’s technical fest Abhikalpan’15. 🙂

The presentations are here:

pyWorkshop_part1

pyWorkshop_part2

UPD: part1 is updated, typos fixed and string slicing added.

Python cheatsheet for beginners:

  1. Python cheatsheet by DaveChild (HTML)
    This is a simple code to code cheat-sheet for major python stuck-ups. Very handy and helpful, must have if you are a beginner!
  2. Python for Dummies (HTML)
    Again an awesome cheat-sheet for newbies! Whether you’re working with string methods or built-in functions in Python, this Cheat Sheet helps you program the correct order for the operation so you achieve the correct result.
  3. Python Cheatsheet by coffeeghost.com (PNG)
    This Python Cheatsheet from coffeeghost.net may come very handy as a quick script explanation for programmers.
  4. Python Languages and Syntax Cheatsheet (PDF)
    This is a slick and quick cheat-sheet for Python Language & Syntax by Institute of Informatics of Ludwig Maximilian University of Munich.

Reference: http://efytimes.com/e1/fullnews.asp?edid=116332