- Published on
Architecting Multi-Domain Support in AWS
At Flightcontrol, I led a significant redesign of our custom domain implementation. Our platform initially allowed users to attach only a single domain to each service, which created substantial limitations for customers managing multiple applications or brands. This article explores how I architected a new domain management system to support multiple domains per service while providing users with a transparent, guided setup process.
The Challenge: Limitations of Single Domain Replacement
The original domain management system was built around a simple premise: one service could have one custom domain. This created several concrete problems for our users:
No Multi-Domain Support: Users who needed to serve their application under multiple domains had to create duplicate services—increasing costs and management complexity.
Limited Certificate Configuration: Each service used a single certificate, making it impossible to use different certificate configurations for different groups of domains.
Black Box Process: The domain setup workflow provided little visibility into the current state. Users would submit their domain and then wait, with minimal feedback about what was happening behind the scenes.
Unhelpful Error Messaging: When domain attachments failed, users received generic error messages without actionable guidance, leading to frustration and increased support volume.
To solve these problems, I needed to fundamentally rethink how domains were managed in Flightcontrol.
Solution: Domain Groups and Transparent Deployment Stages
Working with input from our engineering team, I designed a solution that introduced two key architectural concepts:
- Domain Groups: Collections of domains that share a certificate and CloudFront distribution
- Domain Deployment: A staged process with clear status indicators and user guidance
Domain Groups: The Foundation for Multi-Domain Support
I created the domain group as a central abstraction for managing multiple domains. Each group contains:
- A primary domain and optional alternative domains
- A link to a single CloudFront distribution
- A shared SSL/TLS certificate covering all domains in the group
Users can create multiple domain groups for each service, allowing for different certificate configurations if needed. For example, a user might have one domain group for their primary brand domains and another for legacy domains.
This model provides significant advantages:
- Resource Efficiency: Multiple domains share infrastructure resources, reducing costs
- Simplified Management: Related domains can be managed together
- Flexible Configuration: Different domain groups can use different certificates and CloudFront settings
Domain Deployment: Bringing Transparency to the Setup Process
To demystify the domain setup process, I introduced two complementary status types that work together to provide detailed visibility into the deployment process:
Domain Group Deployment Status
This status shows the overall progress of a domain group deployment:
- QUEUED: Initial deployment has been queued
- ANALYZING: Analyzing domains and determining required actions
- REQUESTING_CERTIFICATE: Certificate request is being created in AWS ACM
- AWAITING_CERTIFICATE_ISSUE: Certificate request submitted, waiting for DNS verification and ACM to issue the certificate
- APPLYING_CERTIFICATE: Certificate has been issued and is being attached to CloudFront
- CERTIFICATE_ATTACHED: Certificate has been successfully attached to CloudFront
- ERROR: Deployment encountered an error (with detailed error information)
- CANCELLED: Deployment was cancelled by the user
Domain Attachment Status
Each individual domain within a group has its own attachment status, providing granular information about what's happening with that specific domain:
- PENDING: Domain has been added but processing has not started
- CREATING: Domain is being created in our system
- AWAITING_VERIFICATION_RECORDS_TO_BE_SET: DNS verification records need to be added by the user
- PENDING_CERTIFICATE_FOR_ATTACHMENT: Waiting for the certificate to be issued before attachment
- ATTACHING: Domain is being attached to CloudFront
- ATTACHED: Domain is successfully attached and operational
- ERROR_ATTACHING: An error occurred during the attachment process
- DETACHING: Domain is being removed from CloudFront
- PENDING_CERTIFICATE_FOR_DETACHMENT: Waiting for certificate update before completing detachment
- ERROR_DETACHING: An error occurred during the detachment process
- DETACHED: Domain has been successfully removed
This dual-status approach is necessary because domain management is a dynamic process where users can add or remove domains at any point. For example, while a certificate is in the APPLYING_CERTIFICATE stage (which takes about 6 minutes on average), a user might add a new domain to the group. In this case, the overall deployment status continues to show APPLYING_CERTIFICATE, while the newly added domain would show an individual status of PENDING or CREATING.
By tracking both statuses independently, the system can accurately represent asynchronous changes to domain configurations while still providing a clear picture of the overall certificate deployment process. This gives users maximum flexibility to modify their domain configuration at any time without disrupting ongoing processes.
Each stage has clear user guidance, explaining:
- What is happening behind the scenes
- What actions the user needs to take (if any)
- What to expect next
This staged approach transforms what was previously a black box into a transparent, guided process.
The Architecture Behind Domain Management
Let's examine the architecture that powers my multi-domain support implementation, focusing on the key components and workflows.
Domain Group Orchestration Workflow
At the heart of my implementation is a Temporal workflow that orchestrates the entire domain configuration process. This workflow manages the lifecycle of a domain group, from initial creation through certificate issuance to final attachment.
The workflow handles:
- Domain verification
- Certificate provisioning
- CloudFront configuration
- Error handling
- Status updates
Let's break down the key architectural components:
1. Certificate Request and CloudFront Provisioning
The first major step in my architecture is requesting an ACM certificate and creating a CloudFront distribution if needed:
When a user creates a domain group, the system first determines whether to use an existing CloudFront distribution (the default for the service) or create a new one.
If a new CloudFront distribution is needed, I provision it with appropriate settings based on the service type.
The system then creates a certificate request in AWS Certificate Manager (ACM) for all domains in the group.
The certificate request generates DNS verification records that the user will need to add to their DNS configuration.
This process leverages AWS ACM's ability to issue certificates covering multiple domains, minimizing the number of certificates required.
2. DNS Record Management and User Guidance
Once the certificate request is created, I guide the user through the verification process:
The system retrieves the verification records generated by ACM.
These records are stored in our database and presented to the user through the UI and API.
The system also checks for Certificate Authority Authorization (CAA) records, which can prevent certificate issuance if not properly configured.
If needed, I guide users to add CAA records that allow AWS to issue certificates for their domains.
This component ensures users have all the information they need to successfully complete the domain verification process.
3. Temporal Workflows for Asynchronous Monitoring
Domain setup involves several asynchronous processes, particularly waiting for DNS propagation and certificate issuance. I designed Temporal workflows to manage these processes:
A dedicated child workflow monitors DNS propagation, checking whether verification records have been properly configured.
Another workflow waits for the ACM certificate to be issued, which can take anywhere from minutes to hours depending on DNS propagation.
These workflows update the domain group's status as it progresses through the various stages, ensuring users always have visibility into the current state.
Temporal's durable execution model is perfect for these long-running processes, as it maintains state even through service restarts or deployments.
4. Certificate Attachment and CloudFront Configuration
Once the certificate is issued, I attach it to the CloudFront distribution:
The workflow fetches the issued certificate from ACM.
It attaches the certificate to the CloudFront distribution, enabling HTTPS for all domains in the group.
If this is a certificate rotation (replacing an existing certificate), the system ensures the new certificate is fully configured before removing the old one.
Once the CloudFront distribution is updated, the system provides the final DNS records needed to route traffic to the service.
This component ensures zero-downtime certificate rotation and seamless traffic routing.
Error Handling and Domain Conflict Detection
One of the most significant improvements in my architecture is comprehensive error handling, particularly for domain conflicts:
When a domain fails to attach, the system detects the specific error condition.
For domain conflicts (when a domain is already in use elsewhere), I implemented a special detection mechanism:
- I temporarily modify CloudFront configuration to trigger conflict detection
- The system captures detailed information about where the domain is already in use
- It provides this specific information to the user, enabling them to resolve the conflict
For certificate issuance failures, the system analyzes the root cause (e.g., DNS misconfiguration, CAA restrictions) and provides targeted guidance.
This sophisticated error handling dramatically improves the user experience, transforming generic errors into actionable guidance.
User Experience: A Guided Journey
From the user's perspective, the domain setup process now follows a clear, guided flow:
- Create a Domain Group: User adds domains to a new or existing domain group
- Add Verification Records: System provides the necessary DNS records to verify domain ownership
- Monitor Status: User can track the progress of domain setup through clear status indicators
- Configure Traffic Routing: Once verification is complete, user adds the final DNS records to route traffic
Throughout this process, the system provides detailed guidance and status updates, explaining what's happening behind the scenes and what actions the user needs to take.
Challenges and Solutions
Implementing this system presented several challenges that required creative solutions:
DNS Propagation Unpredictability
DNS propagation remains one of the most unpredictable aspects of domain setup. My solution includes:
- Intelligent retry strategies that balance quick detection with resource efficiency
- Status updates that show the current state of DNS records
Certificate Authority Authorization (CAA) Records
If CAA records are set, they can prevent certificate issuance, leading to confusion for users. My architecture:
- Proactively checks for CAA records before requesting certificates
- Analyzes the domain hierarchy to find the most specific applicable CAA records
- Provides guidance for adding appropriate CAA records when needed
Domain Conflict Detection
Detecting when a domain is already in use by another AWS resource required a particularly creative approach, due to limitations in AWS's APIs.
The challenge here was multifaceted: AWS does offer an API to identify where a domain is being used, but with a catch: you need a valid ACM certificate, and that certificate must be attached to a CloudFront distribution before you can use this endpoint. This creates a circular dependency problem when you're trying to understand where a domain is being used.
To solve this, I implemented a workaround for distributions that don't yet have domains attached. When we encounter a "domain in use" error, we temporarily attach the issued certificate to CloudFront to trigger AWS's conflict detection mechanisms. This allows us to capture the conflict details and relay them to the user without requiring a pre-existing valid certificate.
In future iterations, we plan to expand this capability to handle Cloudfronts in use as well by temporarily creating a CloudFront distribution whenever we need to detect conflicts, making the process even more seamless and informative for users.
Conclusion
Through collaborative efforts with our team, I led the architectural redesign of Flightcontrol's domain management system, introducing domain groups and transparent deployment stages. This transformed our custom domain feature from a limited, opaque process into a flexible, transparent system that empowers users to manage multiple domains with confidence.
The architectural improvements we've implemented—leveraging Temporal workflows, sophisticated error handling, and a staged deployment process—have significantly enhanced the user experience while enabling more complex domain configurations.
This project illustrates how thoughtful architecture and user-centered design can transform even complex infrastructure processes into intuitive, guided experiences that help users accomplish their goals with minimal friction.