Architecting a solution is perhaps my favourite stage of any project. Where I get to flex my creative muscles and see what is possible, how something might look, and how it all hangs together.
Initial High-Level Solutions
I would normally spend a lot of time researching potential solutions and offer high-level options for further discussion if I think it’s appropriate. When it came to Greenfield, there was no point building onto what they had.
There would be too much impact on the existing services so the best solution was to build a new infrastructure from the ground up and move applications over.
I produced a couple of high-level design diagrams of what a new infrastructure might encompass.
The first HLD using SFTP-enabled storage accounts.
The second HLD with virtual machines hosting SFTP.
The designs were discussed with the team and pros and cons weighed up. Essentially we wanted to move away from hosting SFTP on virtual machines so the first design was favored. With additional feedback, I went away and refined the design.
Final Design
Even though I was deploying a new infrastructure decided to not use Azure Landing Zones as it added complexity to an already established Azure environment.
I also wanted to use Terraform to deploy most of my resources in Azure, Datadog, and MongoDB as well as integrate it with other services like Azure DevOps, and a landing zone might prove to be too rigid for my needs.
I would also initially set up management groups and assign new subscriptions to them, using IAM & RBAC to manage access to the resources within.
There would be several Azure subscriptions to host each relevant environment or service. An Operations subscription for networking, monitoring, and generic resources, three Azure subscriptions to host our technical and environment stacks, and a POC subscription.
From a core networking perspective, it would be a wholly private infrastructure, located behind a firewall without anything publicly accessible unless via that firewall.
Bearing in mind the phases I discussed previously here, this is a breakdown of my design decisions.
Components & Integration
Integration
The Azure backend infrastructure would be fully integrated with the following services:
• PowerBI (Fabric) for reporting;
• Azure DevOps for application deployment;
• Datadog for monitoring;
• Terraform Cloud for deployment;
• MongoDB cloud-hosted databases;
• Cloudflare Application Firewall.
Networking
We would deploy Azure VWAN, bringing networking, security, and routing functionalities together to provide a single operational interface.
Within the VWAN we would deploy a Secure Hub, so firewalled, meaning any resources would be locked away and not publicly visible and accessible.
It's a fully stateful firewall as a service with built-in high availability and unrestricted cloud scalability. It provides both east-west and north-south traffic inspection.
Anything requiring a public ingress would route in via the firewall, with a DNAT rule to manage networking.
Virtual networks would be peered to the VWAN, meaning the environments were more accessible internally.
Our Prod environment would be further isolated and only accessible through a jump box with the relevant RBAC (Role Based Access Control) permissions.
Resources used to host and run our applications and services would all have private endpoints configured with a DNS record hosted in the relevant Private DNS zone.
SFTP
The design would move away from the current file transfer solution to a new SFTP service hosted in Azure on an ADLS-enabled storage account.
The storage account would be configured with a private endpoint and any public access switched off.
A public DNS record would route to a public IP hosted on our firewall, and a DNAT rule configured to allow traffic in.
Clients would have a local account configured on the storage account for access to their secure area.
This will ensure any services or automated processes work as expected for clients ahead of any switchover.
Access to this service for clients would be locked down with their public address ranges where they expect to file transfer from.
Stacks
With the new private infrastructure, we would organise the internal environments into Technical and Environment stacks.
A Technical Stack would be any infrastructure and resources shared across an environment; like virtual networks, virtual machines, and Kubernetes clusters, as well as storage accounts & key vaults.
An Environment Stack would be anything specific to an environment itself, like storage accounts & key vaults, Postgres, Service Bus, and MongoDB.
More on stacks can be found here (link to come)
Security & Best Practice
Concerning best practices, as well as following relevant vendor architecture and deployment guides, we would be using naming conventions. I hate chaotic random names.
The Azure subscriptions would be locked down at a Management Group level, and the subscription access restricted to Operations only.
We would use Role-based access control (RBAC) across all Azure resources using defined internal Entra roles assigned to user Entra accounts. Essentially each person has a defined job role and if that allows them permission then they are assigned access.
We ensure 2FA is enabled on Entra for extra security, so it’s not just what you know but what you have.
General access would be restricted to Reader as resources are to be deployed via Terraform, so any changes just get changed back or overwritten according to the state file anyway.
The network is private and locked behind a firewall, with further internal networking restrictions.
Scalability & High Availability
The new infrastructure would not be siloed which means services could be deployed more easily across environments without having to plan for each environment.
We would deploy central networked services where possible.
Availability Zones would be applied to the appropriate resources and services. For DEV environments it wouldn’t be needed but it would for PROD.
Everything would be deployed using Terraform infrastructure as code; so, taking advantage of reusable code and the ability to deploy services much quicker.
Performance, Management & Monitoring
We would utilise Datadog and Azure Monitor & Alerts to manage and monitor service-impacting services.
User Experience, Collaboration & Consistency
Having reviewed Bastion vs VPN, both provide a secure means to RDP/SSH to workstations or virtual machines, however, Bastion does not give you the ability to securely connect to the network and connect to other services such as secure storage accounts, key vaults, AKS, etc
From a user perspective, they would no longer use Bastion but instead use VPN to connect and access not only virtual machines but other resources using tools like Storage Explorer, Compass, or PGAdmin, giving greater access and flexibility.
I would also plan to hold regular show-and-tell sessions with internal users to show them how things would work and get any feedback, as well as discuss services & data that would need migrating.
Implementation Plans
For more details see here (link to come)
Well, I hope you found this useful in some way. Please keep following and reading, and subscribe if you want to keep updated.