Architecture

This guide provides an overview of the MigratoryData server and explains its concepts and features.

The Web Messaging Problem

Traditionally, the web servers deliver data using a request/response model as follows:

  1. The user opens a web page in the browser

  2. The web browser makes an HTTP request to the web server and displays the content of the web page as it is available on the web server at the moment of the request

  3. If after the web page is displayed, the content of the page changes, then the user will not see the changes until the user manually refreshes the web page in the browser

A technique named HTTP polling is often used to provide users with up-to-date content without the need for the manual refresh of Step 3 above, thus achieving a form of web messaging. The polling technique consists of including some JavaScript code in the web page which is executed by the web browser in background as long as the web page is displayed by the browser. The script periodically sends HTTP requests to the web server to check whether or not there are changes in the content of the web page. If there are no changes, the script will do nothing, otherwise it will display the new content.

While the HTTP polling technique can be used to achieve a rudimentary form of web messaging for certain applications, this technique becomes highly inefficient for many other web applications, especially for those with a high number of users and expecting low data latency, near to the real-time.

Indeed, even if the polling script is configured to poll the web server as frequently as a web server can handle, the latency of data - i.e. the time required to propagate the data from the server side to the user - can still be too high for web applications such as a financial portal delivering real-time market data or a sports betting website where each millisecond counts.

Also, given that the web servers have substantial limitations in handling a high number of concurrent users which perform a high number of HTTP requests per second, a real-time web application with a large number of users, say of the order of millions, would need a lot of web servers to achieve web messaging through HTTP polling. Moreover, each HTTP request sent by the polling script, just to check whether there is or not any fresh data on the server, includes hundreds of bytes, which are automatically added by the browser to each request as the HTTP headers. This redundant overhead sent by many users at a high polling frequency produces an important bandwidth consumption. The need for many server machines for the web server and the high network utilization due to the overhead of the HTTP headers significantly increase the total cost of ownership for the applications with many concurrent users when using the HTTP polling technique to deliver real-time data.

MigratoryData’s Solution

The MigratoryData server delivers data according to a publish/subscribe model as follows:

  1. The user sends an HTTP/WebSocket subscription request to the server

  2. A persistent TCP connection is created between the user’s browser and the MigratoryData server

  3. The MigratoryData server uses this persistent TCP connection to deliver data available at the moment of the request, as well as any subsequent data as soon as such data is available, and without any additional requests from the user

MigratoryData implements the WebSockets protocol (see the RFC 6455 standard). Currently all major browsers have support for WebSockets. MigratoryData creates the TCP streaming connection described at Step 2 above using WebSockets. For the old browsers without support for WebSockets, MigratoryData implements techniques which still use a single TCP connection for streaming – like WebSockets do behind the scene – and it’s 100% JavaScript-based, no browser plug-in being required.

Important

Unlike classical web servers or other WebSocket implementations, MigratoryData Server was designed to scale to a huge number of concurrent users. A new standard in scalability has been set when MigratoryData solved the C10M problem, which is the ability to handle 10 million concurrent users on a single 1U server (with substantial messaging traffic, of the order of 1 Gbps).

The solution MigratoryData proposes not only is a very scalable messaging solution for adding real-time functionalities to web applications, it also achieves high scalability without sacrificing reliability as detailed in the paper Reliable Messaging to Millions of Users with MigratoryData (the final version of this paper is available in the proceedings of the 18th ACM/IFIP/USENIX Middleware International Conference from ACM).

The MigratoryData Client API for JavaScript for building very scalable and reliable real-time web applications is also available for other technologies. You can use the same MigratoryData Client API to build real-time mobile applications for iOS and Android, as well as other real-time Internet applications written in Java, .NET, C++, NodeJS, Python, PHP, and Ruby.

Last but not least, MigratoryData is a mature and proven web messaging solution, being used since over a decade to successfully deliver real-time data to millions of users every day.

Product Overview

This section provides a brief introduction to the MigratoryData solution which consists of:

  • MigratoryData Server

  • MigratoryData Client SDKs

  • MigratoryData Plugin SDKs

  • MigratoryData Integrations

MigratoryData Server

The key features of the MigratoryData server are described below.

Universal API for Messaging

MigratoryData exposes a simple publish/subscribe client API with libraries for the most popular platforms and programming languages, including iOS, Android, JavaScript, Java, C#, C++, Node.js, Python, PHP, and more. Please refer to the Client SDKs section below for more details.

Massive Scalability

The MigratoryData server running on a single 1U server can handle 10 million concurrent users (with substantial outgoing messaging traffic, of the order of 1 Gbps). Thus achieving 1000x more scalability than the well-known C10k problem which has been for a long time a scalability challenge in the realm of web servers. Besides its unparalleled vertical scalability, MigratoryData scales horizontally through built-in clustering to cost-efficiently handle any number of users.

Active/Active Clustering

Multiple instances of the MigratoryData server can be deployed as a fault-tolerant active/active cluster with no single point of failure. Please refer to the clustering section below for more details.

Guaranteed Message Delivery & Message Ordering

MigratoryData clustering can be configured to achieve guaranteed message delivery and message ordering even in the event of unexpected events such as hardware failures or network disconnections. Please refer to the Guaranteed Message Delivery section below for more details.

Efficient Network Utilization

As detailed in introduction, MigratoryData implements the publish/subscribe model using a compact message protocol over WebSockets. Thus achieving up to several orders of magnitude bandwidth improvement comparing to the traditional HTTP polling approach.

High Messaging Volumes

MigratoryData is designed to scale up the the entire capacity of the allocated resources, including the network. For example, MigratoryData deployed on a single 1U server can deliver a messaging volume of 8.88 Gbps on the 10 Gigabit Ethernet, consisting of almost 2 million messages per second.

Milliseconds Latency

Fresh data available on the server is delivered to users in milliseconds. For example, MigratoryData running on a single 1U server can achieve almost 1 Gbps messaging to 10 million concurrent clients with a consistent end-to-end latency of under 15 milliseconds.

Enterprise-grade Security

MigratoryData uses the industry standards like TLS/SSL for data encryption, IP-based protections, dual firewalls, and token-based authorization/entitlement. Please refer to the Security section below for more details.

Advanced Monitoring

Encrypted and password-protected monitoring is made available though the industry standards JMX and HTTP which can be integrated with most enterprise management systems. See the monitoring section below for more details.

Extensible with Plugins

MigratoryData currently provides several Extension APIs for building plugins for audit, authorization, and user presence. Please refer to the Plugin SDKs section below for more details.

Enterprise Integrations

MigratoryData currently provides integrations with Elastic for log management and monitoring, as well as with Firebase for push notifications to offline mobile users.

MigratoryData Client SDKs

MigratoryData offers a common client API with libraries for various languages and technologies as listed below.

Application Type

Client API

Functions

Web Applications

JavaScript

publish and subscribe

Mobile Applications

iOS

publish and subscribe

Android

publish and subscribe

Enterprise Applications

Java

publish and subscribe

.NET

publish and subscribe

C++

publish and subscribe

Python

publish and subscribe

PHP

publish

Ruby

publish

MigratoryData Plugin SDKs

Several extension APIs for building plugins for the MigratoryData server are available.

Extension API

Description

Entitlement

Define data access rules

Audit

Provide various audit info (access, messages, stats, etc)

Presence

Provide information about user presence

MigratoryData Integrations

Currently MigratoryData provides the following integrations.

Integration

Description

Firebase

Deliver push notifications to offline mobile users

Elastic

Log management and monitoring

Concepts

This chapter describes the core concepts of the MigratoryData server:

  • Message

  • Publish/Subscribe Model

Messages

A MigratoryData message consists of several pieces of information as follows:

Property

Description

subject

the subject of the message

content

the content of the message

retained

indicate whether or not the message should be/was retained by the cluster

qos

indicate the quality of the service of the message, which can be either:

  • QOS_STANDARD or

  • QOS_GUARANTEED

type

the type of the message, which can be one of the following:

  • snapshot

  • update

  • historical

  • recovered

replySubject

the subject to reply to the received request message

MigratoryData Client API provides methods to create messages from application-specific data, publish messages, and retrieve the application-specific data from messages when received by the client.

Subjects

A message subject can be used by the subscriber clients to listen for messages with that subject. Also, it can be used by the publisher clients to publish messages with that subject.

A subject is a string of characters that respects a syntax similar to the Unix absolute paths. It consists of an initial slash (/) character followed by two or more character strings, called segments, separated by the single slash (/) character. Within a segment, the slash (/) character is reserved. Each subject must have two or more segments.

For example, the following character string, composed by the segments Stocks, NYSE, and IBM, is a valid subject for the MigratoryData server /Stocks/NYSE/IBM.

Here are some examples of invalid subjects:

Invalid Subject

Reason

/Stocks//IBM/BID

The slash (/) character is not allowed in a segment or because

/Stocks

The second segment is empty

Stocks/IBM/BID

The subject does not start with a slash (/) character

/Stocks/IBM/BID/

The last segment is empty

/

Subject is formed from a single empty segment

/Stocks

Subject has only one segment (two or more are required)

Snapshot Messages

For each subject X, the MigratoryData server maintains a snapshot message which is the most recent message received by the MigratoryData having the property retained on true.

The following tables shows the snapshot message of the subject /Stocks/NYSE/IBM as new messages are received by the MigratoryData server.

Time

Received Message

Snapshot Message

10:12

subject=/Stocks/NYSE/IBM

No snapshot available

(first message)

content=140

for /Stocks/NYSE/IBM

retained=false

at this time

10:15

subject=/Stocks/NYSE/IBM

subject=/Stocks/NYSE/IBM

content=141

content=141

retained=true

retained=true

10:25

subject=/Stocks/NYSE/IBM

subject=/Stocks/NYSE/IBM

content=144

content=141

retained=false

retained=true

10:40

subject=/Stocks/NYSE/IBM

subject=/Stocks/NYSE/IBM

content=142

content=142

retained=true

retained=true

When a client subscribes to a subject, the MigratoryData server will firstly send to that client the snapshot message of that subject (if available), then it will send the subsequent real-time messages for that subject as they are received by MigratoryData server from the publisher clients.

Tip

Retrieve Snapshots via HTTP Requests

You can also get the snapshot message of a subject from the MigratoryData server via a simple HTTP request. Supposing the MigratoryData server is deployed at https://push.example.com, then you can retrieve the snapshot message of a particular subject X via the following HTTP request provided that no entitlement rule is configured:

https://push.example.com/snapshot?subject=X

Otherwise, the entitlement/authorization feature is enabled for the MigratoryData server, then you should also include the entitlement token in the HTTP request. Supposing the entitlement token of the user is U and the user identified by the token U is allowed to subscribe to the subject X, then, to retrieve the snapshot of the subject X, use:

https://push.example.com/snapshot?subject=X&token=U

Batching

Batching is the process of collecting several messages together for a period of time or until a total size is reached before sending them in a single I/O operation to a client.

To enable the Batching feature, a period of pre-configured time and/or a pre-configured size should be configured by the MigratoryData server using the parameter MaxBatchingTime and MaxBatchingSpace.

Once enabled, MigratoryData will not send individually every message to the client, instead it will send messages in batches, thus MigratoryData will perform a single I/O network operation for a single batch (that contains a number of messages).

Depending on your use case, especially if subjects are systematically updated at a high frequency (multiple messages per second), then batching can optimize the network I/O.

The following diagram shows the the circulation of messages without batching.

Circulation of Messages without Batching

The following diagram shows the the circulation of messages with batching enabled.

Circulation of Messages with Batching

Character Encoding

The UTF-8 character encoding is used for all components of MigratoryData messages including for the message content, message subject, field names and field values. Thus, MigratoryData Server is able to handle messages with any international character set including ASCII.

Publish-Subscribe Model

The Publish-Subscribe Model is defined as follows. A client connects to a MigratoryData server and subscribes to a subject X. Depending whether the subject X is already subscribed by other clients or not, one of the following two situations will happen:

  • If X is already subscribed, then the MigratoryData server will send to that client the snapshot message of the subject X (if available). Also, it will send to the client any subsequent message with the subject X received from the publishers

  • If X is not subscribed by any other client, then the MigratoryData server will send to the client any message with the subject X once received from the publishers

When the client is not interested anymore by the messages with the subject X, it can unsubscribe from the subject X.

The following diagram shows an example of publish-subscribe interaction. Note that Subscriber 1 which subscribes to the subject A receives only messages with the subject A. It does not receive messages with subject B as it does not subscribe to the subject B.

MigratoryData Publish-Subscribe Model

Features

This section provides more details about some features of the MigratoryData server.

Security

The security of the MigratoryData server is assured by:

  • Use the industry standards TLS/SSL for encrypted communication with the clients

  • Use TLS/SSL encryption and authentication for JMX and HTTP monitoring

  • Configurable list of TLS/SSL ciphers

  • Inter-cluster communication is password protected

  • Configurable to run as a normal non-privileged user

  • Deploy using dual firewall and DMZ policy

  • Use Entitlement for data access protection

  • Enable message publication from a configurable list of IP addresses

Communication Ports

MigratoryData can listen for client connections on one or more ports. If the machine hosting the MigratoryData server is multi-homed (i.e. it has multiple IP addresses associated either with multiple network interfaces or with a single network interface but using multiple IP aliases), then the MigratoryData server can be configured to listen on one or more ports of one or more IP addresses of the machine.

Moreover, the ports can be configured to accept encrypted connections via HTTP Secure (https) or WebSocket Secure (wss). Note that MigratoryData Server can be configured to accept normal connections, encrypted connections, or both encrypted and normal connections.

Both protocols HTTP and WebSocket use the same standard port numbers: 80 for normal connections and 443 for encrypted connections.

Important

For production deployment, the recommendation is to configure the MigratoryData server to use encrypted connections. In this way, your data will be securely delivered. Encrypted connections might also help to avoid the interference with certain security solutions.

For example, when using normal connections, certain antivirus software might block the data streaming between the MigratoryData server and a client. It might wrongly interpret the data streaming as a potential security attack. Using encrypted connections, the antivirus software is unable to inspect the data.

MigratoryData Server should be typically configured to accept client connections on a public address, say push.example.com. Ideally, it should be configured to accept only encrypted client connections via the standard https/wss port 443. Thus, it’s configuration should be as follows:

ListenEncrypted = push.example.com:443

The following diagram shows how both subscribers and publishers securely connect to a single open port of the MigratoryData server.

MigratoryData Using a Single Communication Port

While it’s perfectly valid and beneficial to use a single network address and a single port to accept all clients, there are setups when the MigratoryData server is deployed in the DMZ and the publisher clients are deployed behind the second firewall of the DMZ to integrate with the backend servers. In this setup, the publishers typically are not allowed to access Internet addresses, thus they will not be allowed to connect to push.example.com:443. For such a setup, a secondary LAN address, say 192.168.1.1, should be configured on the machine hosting the MigratoryData server. For this local address, you can configure any port available to accept connections from publishers, provided however that the port is allowed by the firewall. As in the case of the client port, you can configure the publisher port to accept either normal or encrypted connections.

The following diagram shows the ports used by the MigratoryData server to communicate with its clients and publishers.

Note

A new port (not shown in the diagram) should be opened if you enable the JMX or HTTP monitoring feature, and up to three other ports (not shown in the diagram) are used for inter-cluster communication.

MigratoryData Communication Ports

DMZ Deployment

The following diagram shows a secure dual firewall DMZ deployment of MigratoryData Server.

MigratoryData Secure Dual Firewall DMZ Deployment

Entitlement

The goal of the Entitlement feature is to offer a data control mechanism such that every client will access only messages with the subjects for which it was authorized to subscribe to and will publish messages only for the the subjects for which it was authorized to publish on.

The entitlement should be enabled as follows:

  • Configure the parameter Entitlement of the MigratoryData server

  • Use the API call setEntitlementToken() to assign an entitlement token to the client

  • If the Entitlement is Custom, use the MigratoryData Extension API for Entitlement to build an extension plugin which implements the your entitlement rules

Monitoring

You can monitor the MigratoryData server using the Java Management Extensions (JMX) technology and/or HTTP requests. You can access the monitoring services, either JMX or HTTP, with or without password authentication, via normal or encrypted connections depending on the configuration of the MigratoryData server.

The indicators which can be monitored are:

  • The number of connected users to the MigratoryData server

  • The number of connections per second established with the MigratoryData server

  • The number of disconnections per second from the MigratoryData server

  • The number of incoming messages per second received from the publishers

  • The number of outgoing messages per second sent to the subscribers

  • The number of incoming bytes per second received from the publishers

  • The number of outgoing bytes per second sent to the subscribers

The following statistics are computed for the parameters above:

  • Maximum

  • Average

  • Standard Deviation

At each moment, the values of the statistics above are available for the following periods of time:

  • Last minute, last 5 minutes, and last 15 minutes

  • Last hour, last 5 hours, and last 15 hours

  • Last day, last 5 days, and last 15 days

  • Last month, last 5 months, and last 15 months

Clustering

You can deploy multiple instances of MigratoryData Server as an active/active cluster to achieve:

  • Fault tolerance with no single point of failure

  • Horizontal scalability through load balancing

Therefore, the MigratoryData solution does not require specialized appliances or sophisticated application delivery network services to achieve load balancing and fault tolerance, all these are built-in.

There are two clustering modes offering different qualities of service for message delivery:

Clustering Model

Server configuration

Standard Message Delivery

ClusterDeliveryMode = Standard

Guaranteed Message Delivery

ClusterDeliveryMode = Guaranteed

To enable Standard Message Delivery, you will need to deploy a cluster of at least two MigratoryData servers. To enable Guaranteed Message Delivery, you will need at least three MigratoryData servers for your cluster.

Both clustering modes offer reliable message delivery and fault tolerance, including automatic client reconnection if the connection between a client and a cluster member goes down or if a cluster member goes down. However, Guaranteed Message Delivery offers a better quality of service as detailed in the Guaranteed Message Delivery section below.

Guaranteed Message Delivery

All entities which communicate with the MigratoryData server use the TCP protocol at the transport layer, which is a reliable protocol. Also, with any of the two clustering modes, the client will automatically reconnect to the cluster and will continue to function even if a sudden failure occurs. This self-healing feature and the use of the reliable TCP protocol offer reliable message delivery for both clustering modes. However, Guaranteed Message Delivery offers something more as follows:

  • With Standard Message Delivery, when a client reconnects to another cluster member after a failure, it will get only the snapshot messages available for its subscribed subjects (as well as any subsequent message for its subscribed subjects)

  • With Guaranteed Message Delivery, when a client reconnects to another cluster member after a failure, it will get not only the snapshot messages available for its subscribed subjects, but also the messages received by the cluster for its subscribed subjects during the fail-over period (as well as any subsequent message for its subscribed subjects)

The following diagram shows an example of fail-over recovery with Standard Message Delivery enabled. Note that when the client reconnects to the server B at 10:12:20, it will only get the snapshot message available for the subject X at that time (which is the latest message for the subject X with the flag retained on true). Therefore, it will not get one of the two messages which occurred during the fail-over recovery period, i.e. the message at 10:12:05.

Example of Data Recovery With Standard Message Delivery Enabled

The following diagram shows an example of fail-over recovery with Guaranteed Message Delivery enabled. Note that when the client reconnects to the server B at 10:12:20, it will get not only the snapshot message available for the subject X at that time, but all messages received during the fail-over recovery period, including the message at 10:12:05.

Example of Data Recovery With Guaranteed Message Delivery Enabled

Internals

The paper Reliable Messaging to Millions of Users with MigratoryData explains the internals of the MigratoryData server and how concepts like replication, in-memory distributed caching, coordinators, subscribers partitioning, subject groups, sequence numbers, epoch numbers, two-layer architecture: I/O and workers, message ordering are used to achieve high vertical scalability, horizontal scalability, clustering, and guaranteed message delivery at scale. The final version of this paper is available in the proceedings of the 18th ACM/IFIP/USENIX Middleware International Conference from ACM.