Service Discovery

Overview

With Service Interconnection Fabric, services may scale in and out across VM and container environments; communicating instances may appear in the same private data center or a thousand miles apart in different clouds; instances may be instantiated in overlapping or even different–IPv4/6–address spaces; and each particular instance-to-instance interaction may have its own policy to support enterprise security or compliance requirements.

In such a dynamic and heterogeneous environment, the DNS service must be intelligent, secure, and scalable. An intelligent DNS reflects infrastructure fluidity at the application level without overburdening applications. A secure DNS ensures only desired communication happens between application services. And a scalable DNS does not have a single point of failure nor does it lock to a particular infrastructure or execution environment.

The DNS service built into Service Interconnection Fabric offers these features without heavy lift-and-shift migration, with practically zero-touch configuration, and with no maintenance required.

Architecture

Intelligent DNS is a fully distributed system without centralized DNS record management. Each workload node receives its own personalized DNS resolver as a part of Policy Agent functionality.

When the policy controller assigns a communication role to a service instance on a workload node, the workload node starts sending service discovery messages to opposite-role instances to populate their DNS record databases with new entries. The discovery messages, signed by the policy controller, contain an instance Relative Distinguished Name (RDN) and an instance Host Identifier (HID), among other fields.

The RDN identifies an instance service endpoint distinctly in the service interconnection fabric. The policy agent automatically builds the RDN from three components: the host name and host location identifier of the workload node on which the service instance is deployed plus the instance role identifier.

The HID identifies an instance network endpoint distinctly in the service interconnection fabric. The HID is a cryptographically generated address that decouples the application transport layer from the internetworking layer (IP).

The policy agent creates a name resolution record in its database when it receives the service discovery message. The agent forms the name resolution record by augmenting the RDN with the local zone name to form a fully qualified distinguished name (FQDN) and assigning a virtual IP (VIP) to the newly-discovered service instance. Additionally, the policy agent may create two service records: one for the service in a given location and another, more general record, for the service itself.

The policy agent assigns credits to each name resolution record. Records with more credits have a higher priority than records with fewer credits. This mechanism allows the policy agent to redirect a service record to a newly-discovered service instance if the new instance has more credits than the existing instance.

To support zero-touch configuration, the service discovery procedure has both embedded keep-alive and shutdown routines. For example, once a service instance shuts down, the associated records are removed fabric-wide almost immediately.

Intelligent DNS works in private data centers and public clouds for VMs and containers offering seamless and secure application connectivity across various security, technological, geographical, and administrative boundaries.

../_images/intelligent_dns_arch.png

Fig. 6 Intelligent DNS Architecture

Specification

Design Principles

Distributed DNS

Secure peer-to-peer service discovery combined with service instance authentication and flow-level microsegmentation interconnects workload node DNS resolvers into a distributed, intelligent DNS system without centralized record management.

Benefit: Secure, responsive, adjustable, and scalable DNS service with minimal overhead

Host Identity

Services communicate with each other by means of virtual IPs (VIP) that are automatically translated into cryptographically generated host identifiers/addresses (CGA). Communication is built on host identity, not on host locator.

Benefit: DNS service resolves authenticated hosts only from VIP to CGA

Secure Service Discovery

Workload nodes exchange authorized messages so that each service instance discovers its opposite-role instance.

Benefit: Only authorized service instances may discover each other

Personalized Name Space

Each workload node places the discovered remote instance names in the zone meaningful to that node only e.g., different zones for private DCs, public cloud, and Kubernetes clusters.

Benefit: Flexible DNS request routing without environment interdependencies

Personalized VIP Space

Each workload node receives its own personalized address space for VIP allocation.

Benefit: Communicating services may be in overlapping IP-address spaces

Personalized Protocol-agnostic Name Resolution

Each workload node may switch between IPv4 DNS or IPv6 DNS independently from other nodes (supported in coming version).

Benefit: Automatic translation between IPv4 and IPv6

Interface to Applications

Automatic FQDN Builder

Each service instance receives a unique FQDN (within the service interconnection fabric) built automatically from host identifier, location identifier, and instance role configuration.

Benefit: Zero FQDN provisioning

Alternative Names

A service instance can have multiple FQDNs, each corresponding to a different communication role.

Benefit: Supports flow-level communication policy

Role-based Access Control

Each service instance may propagate only its own role instance names and resolve only opposite-role instance names.

Benefit: Protection from unsanctioned service discovery

Instance Affinity

Service instances can query DNS using opposite-role instance affinity: role.namespace, location.role.namespace, host.location.role.namespace.

Benefit: Automatic request re-routing

Instance proximity

DNS records are prioritized based on opposite-role instance proximity to the local service instance. Proximity is defined as the cost of the path between two workload nodes. Cost can be dynamically assigned to each link in service interconnection fabric by an external system and can reflect interconnection quality, instance load, traffic cost, etc.

Benefit: Infrastructure-aware request routing

Responsiveness to Connectivity Failure

A heartbeat allows detection of remote instance connectivity failure.

Benefit: Minimize application downtime

Responsiveness to Instance Failure

On instance shutdown, corresponding DNS records are removed fabric-wide almost immediately.

Benefit: Minimize application downtime

REST API for Retrieving Local DNS Records

The policy agent RESTful API allows for retrieving all records from the DNS resolver database on each workload node.

Benefit: Open for integration with workload orchestrators

Deployment and Maintenance

Cloud-agnostic

The DNS service is embedded in an infrastructure-agnostic service interconnection fabric so it does not require services specific to any cloud or private data center.

Benefit: Not locked to any cloud infrastructure

Supports Both VM and Kubernetes Environments

On each workload node, the DNS service is exposed to service instances as a libnss library (via nsswitch.conf on VMs) or DNS resolver (via CoreDNS on a Kubernetes worker node).

Benefit: Consistent DNS service across VM and container environments

Transparent to Applications

The DNS service requires no changes to applications apart from updating URLs and then forwarding requests to the service interconnection fabric DNS.

Benefit: Minimal changes to applications

DNS Resolver Policy Agent Component

DNS resolver installation is part of policy agent deployment.

Benefit: Minimal footprint, no additional maintenance

CLI for DNS Service Adjustment

The policy agent CLI allows for setting up name and VIP spaces on each workload node.

Benefit: Minimal or no configuration

Seamless Migration

The DNS service does not interrupt or degrade performance of existing DNS services on workload nodes, allowing for a gradual and automatic migration as service instance communication is switched from traditional networking to the service interconnection fabric.

Benefit: No lift-and-shift migration required