<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[DevOps Diaries by Cyril Sebastian – Cloud, Automation & Infra at Scale]]></title><description><![CDATA[DevOps, Cloud & Infra insights by Cyril Sebastian. Learn Terraform, CI/CD, DevSecOps, Monitoring & automation across AWS, GCP, and hybrid environments]]></description><link>https://tech.cyrilsebastian.com</link><generator>RSS for Node</generator><lastBuildDate>Thu, 16 Apr 2026 11:21:10 GMT</lastBuildDate><atom:link href="https://tech.cyrilsebastian.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Keep It Running: Email Infrastructure Operations Guide]]></title><description><![CDATA[Making email infrastructure a pro.

Daily Operations (5 Minutes)
The Morning Health Check
Create this script and run it daily:
cat > ~/email-health.sh <<'EOF'
#!/bin/bash
YESTERDAY=$(date -d "yesterda]]></description><link>https://tech.cyrilsebastian.com/keep-it-running-email-infrastructure-operations-guide</link><guid isPermaLink="true">https://tech.cyrilsebastian.com/keep-it-running-email-infrastructure-operations-guide</guid><category><![CDATA[Devops]]></category><category><![CDATA[AWS]]></category><category><![CDATA[SES]]></category><category><![CDATA[monitoring]]></category><category><![CDATA[postfix]]></category><dc:creator><![CDATA[Cyril Sebastian]]></dc:creator><pubDate>Mon, 23 Mar 2026 05:00:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/66fff73c46655eee7fdfb5b0/352efe28-2608-4ee9-8f55-e3512d32c70f.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Making email infrastructure a pro.</p>
<hr />
<h2>Daily Operations (5 Minutes)</h2>
<h3>The Morning Health Check</h3>
<p>Create this script and run it daily:</p>
<pre><code class="language-bash">cat &gt; ~/email-health.sh &lt;&lt;'EOF'
#!/bin/bash
YESTERDAY=$(date -d "yesterday" +"%b %d")

echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "Email Health ($YESTERDAY)"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━"

# Service status
systemctl is-active --quiet postfix &amp;&amp; echo "Postfix" || echo "Postfix DOWN"
systemctl is-active --quiet ses-logger &amp;&amp; echo "Logger" || echo "Logger DOWN"

# Email stats
SENT=\((grep "\)YESTERDAY" /var/log/postfix/postfix.log 2&gt;/dev/null | grep -c "status=sent")
DELIVERED=\((grep "\)YESTERDAY" /var/log/postfix/mail.log 2&gt;/dev/null | grep -c "status=delivered")
BOUNCED=\((grep "\)YESTERDAY" /var/log/postfix/mail.log 2&gt;/dev/null | grep -c "status=bounced")

echo ""
echo "📊 Volume"
echo "   Sent: $SENT"
echo "   Delivered: $DELIVERED"
echo "   Bounced: $BOUNCED"

if [ $SENT -gt 0 ]; then
  DELIVERY_RATE=$((DELIVERED * 100 / SENT))
  BOUNCE_RATE=$((BOUNCED * 100 / SENT))
  echo ""
  echo "📈 Rates"
  echo "   Delivery: ${DELIVERY_RATE}%"
  echo "   Bounce: ${BOUNCE_RATE}%"
  
  [ $BOUNCE_RATE -gt 5 ] &amp;&amp; echo "   ⚠️  High bounce rate!"
fi

# Queue status
QUEUE=\((mailq | tail -1 | awk '{print \)5}')
[ "\(QUEUE" = "empty" ] &amp;&amp; echo "" &amp;&amp; echo "Queue empty" || echo "" &amp;&amp; echo "⚠️  Queue: \)QUEUE messages"

echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
EOF

chmod +x ~/email-health.sh
</code></pre>
<p><strong>Run it:</strong></p>
<pre><code class="language-bash">./email-health.sh
</code></pre>
<p><strong>Automate it</strong> (runs at 9 AM, emails you results):</p>
<pre><code class="language-bash">(crontab -l 2&gt;/dev/null; echo "0 9 * * * ~/email-health.sh | mail -s 'Email Health Report' admin@yourdomain.com") | crontab -
</code></pre>
<hr />
<h2>Essential Monitoring</h2>
<h3>Real-Time Log Watching</h3>
<p><strong>Monitor live email flow:</strong></p>
<pre><code class="language-bash"># Watch everything
sudo tail -f /var/log/postfix/*.log

# Watch only delivered emails
sudo tail -f /var/log/postfix/mail.log | grep --line-buffered "status=delivered"

# Watch bounces
sudo tail -f /var/log/postfix/mail.log | grep --line-buffered "status=bounced"
</code></pre>
<hr />
<h3>Key Metrics to Track</h3>
<table>
<thead>
<tr>
<th>Metric</th>
<th>Target</th>
<th>Alert If</th>
</tr>
</thead>
<tbody><tr>
<td>Delivery Rate</td>
<td>&gt;95%</td>
<td>&lt;90%</td>
</tr>
<tr>
<td>Bounce Rate</td>
<td>&lt;5%</td>
<td>&gt;10%</td>
</tr>
<tr>
<td>Queue Size</td>
<td>0</td>
<td>&gt;100</td>
</tr>
<tr>
<td>Service Uptime</td>
<td>99.9%</td>
<td>Any downtime</td>
</tr>
</tbody></table>
<p><strong>Quick metric checks:</strong></p>
<pre><code class="language-bash"># Today's delivery rate
SENT=\((grep "\)(date +%b\ %d)" /var/log/postfix/postfix.log | grep -c "status=sent")
DELIVERED=\((grep "\)(date +%b\ %d)" /var/log/postfix/mail.log | grep -c "status=delivered")
echo "Delivery rate: $((DELIVERED * 100 / SENT))%"

# Average delivery time (milliseconds)
grep "status=delivered" /var/log/postfix/mail.log | \
  grep -oP 'delay=\K\d+' | \
  awk '{sum+=$1; n++} END {print "Avg delay: " sum/n "ms"}'

# Top recipient domains
grep "status=delivered" /var/log/postfix/mail.log | \
  grep -oP 'to=&lt;[^@]+@\K[^&gt;]+' | \
  sort | uniq -c | sort -rn | head -5
</code></pre>
<hr />
<h2>Common Operations</h2>
<h3>Adding New Senders</h3>
<pre><code class="language-bash"># 1. Edit whitelist
sudo vim /etc/postfix/allowed_senders

# Add line:
# newsender@yourdomain.com    OK

# 2. Rebuild database
sudo postmap /etc/postfix/allowed_senders

# 3. Reload (no restart needed!)
sudo systemctl reload postfix

# 4. Test
echo "Test" | mail -s "Test" -r newsender@yourdomain.com test@example.com
</code></pre>
<p><strong>No downtime!</strong> Reload picks up changes instantly.</p>
<hr />
<h3>Removing Senders</h3>
<pre><code class="language-bash"># 1. Comment out or remove from whitelist
sudo vim /etc/postfix/allowed_senders
# #oldsender@yourdomain.com    OK

# 2. Rebuild and reload
sudo postmap /etc/postfix/allowed_senders
sudo systemctl reload postfix

# 3. Verify rejection
echo "Test" | mail -s "Test" -r oldsender@yourdomain.com test@example.com
# Should see: "Sender address rejected"
</code></pre>
<hr />
<h3>Managing Mail Queue</h3>
<p><strong>View queue:</strong></p>
<pre><code class="language-bash">mailq
</code></pre>
<p><strong>Flush queue</strong> (retry all deferred emails):</p>
<pre><code class="language-bash">sudo postqueue -f
</code></pre>
<p><strong>Delete specific email:</strong></p>
<pre><code class="language-bash"># Get queue ID from mailq
sudo postsuper -d QUEUE_ID
</code></pre>
<p><strong>Delete all queued emails:</strong></p>
<pre><code class="language-bash">sudo postsuper -d ALL
</code></pre>
<p><strong>Delete only deferred emails:</strong></p>
<pre><code class="language-bash">sudo postsuper -d ALL deferred
</code></pre>
<hr />
<h3>Searching Email History</h3>
<p><strong>Find specific email:</strong></p>
<pre><code class="language-bash">grep "user@example.com" /var/log/postfix/*.log
</code></pre>
<p><strong>Find by sender:</strong></p>
<pre><code class="language-bash">grep "from=&lt;sender@yourdomain.com&gt;" /var/log/postfix/postfix.log
</code></pre>
<p><strong>Find bounces to specific domain:</strong></p>
<pre><code class="language-bash">grep "gmail.com" /var/log/postfix/mail.log | grep "bounced"
</code></pre>
<p><strong>Get complete email journey:</strong></p>
<pre><code class="language-bash"># Get message ID from sent log
MSG_ID=$(grep "user@example.com" /var/log/postfix/postfix.log | grep -oP 'status=sent \(250 Ok \K[^)]+' | head -1)

# Find all events for that message
grep "$MSG_ID" /var/log/postfix/*.log
</code></pre>
<hr />
<h2>Troubleshooting Guide</h2>
<h3>Problem 1: Postfix Won't Start</h3>
<p><strong>Symptoms:</strong></p>
<pre><code class="language-bash">sudo systemctl start postfix
# Job for postfix.service failed
</code></pre>
<p><strong>Fix:</strong></p>
<pre><code class="language-bash"># 1. Check config syntax
sudo postfix check

# 2. View detailed error
sudo journalctl -u postfix -n 20 --no-pager

# 3. Common issues:

# Port in use?
sudo lsof -i :25
# Kill conflicting process: sudo systemctl stop sendmail

# Permission issue?
sudo chown -R postfix:postfix /var/log/postfix
sudo chown -R postfix:postfix /var/spool/postfix

# Check line number from 'postfix check' output
sudo vim /etc/postfix/main.cf +LINE_NUMBER
</code></pre>
<hr />
<h3>Problem 2: Emails Stuck in Queue</h3>
<p><strong>Diagnosis:</strong></p>
<pre><code class="language-bash">mailq  # Shows queued emails
sudo tail -100 /var/log/postfix/postfix.log | grep "status=deferred"
</code></pre>
<p><strong>Common causes and fixes:</strong></p>
<p><strong>Wrong SES credentials:</strong></p>
<pre><code class="language-bash"># Verify credentials
sudo postmap -q "[email-smtp.ap-south-1.amazonaws.com]:587" /etc/postfix/sasl_passwd

# Update if needed
sudo vim /etc/postfix/sasl_passwd
sudo postmap /etc/postfix/sasl_passwd
sudo systemctl restart postfix
</code></pre>
<p><strong>Network blocked:</strong></p>
<pre><code class="language-bash"># Test SES connectivity
telnet email-smtp.ap-south-1.amazonaws.com 587

# Check security group allows outbound 587
# Check route table has internet gateway
</code></pre>
<p><strong>SES quota exceeded:</strong></p>
<pre><code class="language-bash">aws ses get-send-quota --region ap-south-1
# If near limit, wait or request increase
</code></pre>
<p><strong>After fixing, flush the queue:</strong></p>
<pre><code class="language-bash">sudo postqueue -f
</code></pre>
<hr />
<h3>Problem 3: Logger Service Keeps Crashing</h3>
<p><strong>Check logs:</strong></p>
<pre><code class="language-bash">sudo journalctl -u ses-logger -n 50 --no-pager
sudo tail -50 /var/log/ses-logger-error.log
</code></pre>
<p><strong>Common fixes:</strong></p>
<p><strong>boto3 missing:</strong></p>
<pre><code class="language-bash">python3 -c "import boto3" || sudo yum install -y python3-boto3
sudo systemctl restart ses-logger
</code></pre>
<p><strong>Wrong queue URL:</strong></p>
<pre><code class="language-bash"># Get correct URL
QUEUE_URL=$(aws sqs get-queue-url --queue-name ses-events-queue --region ap-south-1 --query 'QueueUrl' --output text)

# Update service
sudo sed -i "s|Environment=\"SQS_QUEUE_URL=.*\"|Environment=\"SQS_QUEUE_URL=$QUEUE_URL\"|" /etc/systemd/system/ses-logger.service

sudo systemctl daemon-reload
sudo systemctl restart ses-logger
</code></pre>
<p><strong>IAM permissions:</strong></p>
<pre><code class="language-bash"># Verify role attached
aws sts get-caller-identity

# Should show: PostfixSESLogger role
# If not, reattach IAM instance profile
</code></pre>
<hr />
<h3>Problem 4: No Delivery Events in Logs</h3>
<p><strong>Diagnosis:</strong></p>
<pre><code class="language-bash"># 1. Check SQS queue has messages
aws sqs get-queue-attributes \
  --queue-url "$(aws sqs get-queue-url --queue-name ses-events-queue --region ap-south-1 --query 'QueueUrl' --output text)" \
  --attribute-names ApproximateNumberOfMessages \
  --region ap-south-1
</code></pre>
<p><strong>If messages are accumulating:</strong></p>
<ul>
<li><p>Logger not processing → Check <code>sudo journalctl -u ses-logger</code></p>
</li>
<li><p>Restart logger → <code>sudo systemctl restart ses-logger</code></p>
</li>
</ul>
<p><strong>If no messages in queue:</strong></p>
<pre><code class="language-bash"># 2. Verify SES publishing to SNS
aws ses get-identity-notification-attributes \
  --identities yourdomain.com \
  --region ap-south-1

# Should show all three topics configured

# 3. Reconfigure if needed
SNS_ARN=$(aws sns list-topics --region ap-south-1 --query "Topics[?contains(TopicArn, 'ses-events-topic')].TopicArn | [0]" --output text)

for EVENT in Delivery Bounce Complaint; do
  aws ses set-identity-notification-topic \
    --identity yourdomain.com \
    --notification-type $EVENT \
    --sns-topic "$SNS_ARN" \
    --region ap-south-1
done
</code></pre>
<hr />
<h3>Problem 5: High Bounce Rate (&gt;10%)</h3>
<p><strong>Analyze bounce reasons:</strong></p>
<pre><code class="language-bash">grep "status=bounced" /var/log/postfix/mail.log | \
  grep -oP 'reason=\(\K[^\)]+' | \
  sort | uniq -c | sort -rn | head -10
</code></pre>
<p><strong>Common reasons:</strong></p>
<p><strong>"User unknown" (invalid addresses):</strong></p>
<pre><code class="language-bash"># Extract bounced addresses
grep "status=bounced" /var/log/postfix/mail.log | \
  grep "bounce_type=Permanent" | \
  grep -oP 'to=&lt;\K[^&gt;]+' | \
  sort -u &gt; bounced_addresses.txt

# Remove from your mailing list
</code></pre>
<p><strong>"Mailbox full":</strong></p>
<ul>
<li><p>Temporary issue, will resolve</p>
</li>
<li><p>Retry after 24 hours</p>
</li>
</ul>
<p><strong>"550 Spam":</strong></p>
<ul>
<li><p>Review email content</p>
</li>
<li><p>Check SPF/DKIM/DMARC setup</p>
</li>
<li><p>Verify sender reputation</p>
</li>
</ul>
<hr />
<h3>Problem 6: Emails Going to Spam</h3>
<p><strong>Verification checklist:</strong></p>
<pre><code class="language-bash"># 1. Check SPF
dig +short TXT yourdomain.com | grep spf
# Should include: include:amazonses.com

# 2. Check DKIM
aws ses get-identity-dkim-attributes \
  --identities yourdomain.com \
  --region ap-south-1
# Should show: DkimEnabled=true, Status=Success

# 3. Check DMARC
dig +short TXT _dmarc.yourdomain.com
# Should return DMARC policy

# 4. Check SES reputation
aws ses get-account-sending-enabled --region ap-south-1
# Should be enabled
</code></pre>
<p><strong>Content checklist:</strong></p>
<ul>
<li><p>Avoid spam trigger words (FREE!, ACT NOW!)</p>
</li>
<li><p>Include unsubscribe link</p>
</li>
<li><p>Balance text/image ratio (60% text minimum)</p>
</li>
<li><p>Use a consistent "From" name and address</p>
</li>
<li><p>Authenticate with SPF/DKIM/DMARC</p>
</li>
</ul>
<hr />
<h2>Performance Optimization</h2>
<h3>Postfix Tuning</h3>
<p><strong>For higher throughput:</strong></p>
<pre><code class="language-bash">sudo vim /etc/postfix/main.cf
</code></pre>
<p>Add/update:</p>
<pre><code class="language-ini"># Increase concurrent deliveries
default_destination_concurrency_limit = 50
default_destination_recipient_limit = 50

# Reduce queue lifetime
maximal_queue_lifetime = 1d
bounce_queue_lifetime = 1d

# Connection caching
smtp_connection_cache_on_demand = yes
smtp_connection_cache_destinations = email-smtp.ap-south-1.amazonaws.com
</code></pre>
<p>Reload:</p>
<pre><code class="language-bash">sudo systemctl reload postfix
</code></pre>
<hr />
<h3>Logger Optimization</h3>
<p><strong>For high volume (&gt;1000 events/min):</strong></p>
<p>Edit <code>/usr/local/bin/ses_logger.py</code>:</p>
<pre><code class="language-python"># Increase batch size
response = sqs.receive_message(
    QueueUrl=queue_url,
    MaxNumberOfMessages=30,  # Up from 10
    WaitTimeSeconds=20
)
</code></pre>
<p>Restart:</p>
<pre><code class="language-bash">sudo systemctl restart ses-logger
</code></pre>
<hr />
<h2>Scaling Strategies</h2>
<h3>When to Scale</h3>
<table>
<thead>
<tr>
<th>Metric</th>
<th>Scale Trigger</th>
</tr>
</thead>
<tbody><tr>
<td>CPU Usage</td>
<td>Sustained &gt;70%</td>
</tr>
<tr>
<td>Emails/day</td>
<td>&gt;40,000 (80% of quota)</td>
</tr>
<tr>
<td>Queue Size</td>
<td>Sustained &gt;100</td>
</tr>
<tr>
<td>Memory</td>
<td>&gt;80% used</td>
</tr>
</tbody></table>
<hr />
<h3>Vertical Scaling (Bigger Instance)</h3>
<p><strong>Current performance by instance:</strong></p>
<table>
<thead>
<tr>
<th>Instance</th>
<th>vCPU</th>
<th>RAM</th>
<th>Emails/day</th>
</tr>
</thead>
<tbody><tr>
<td>t3a.small</td>
<td>2</td>
<td>2GB</td>
<td>10,000</td>
</tr>
<tr>
<td>t3a.medium</td>
<td>2</td>
<td>4GB</td>
<td>50,000</td>
</tr>
<tr>
<td>t3a.large</td>
<td>2</td>
<td>8GB</td>
<td>100,000</td>
</tr>
<tr>
<td>c6a.xlarge</td>
<td>4</td>
<td>8GB</td>
<td>500,000</td>
</tr>
</tbody></table>
<hr />
<h2>Security Hardening</h2>
<h3>Restrict Relay Access</h3>
<p><strong>Tighten network access:</strong></p>
<pre><code class="language-bash">sudo vim /etc/postfix/main.cf
</code></pre>
<pre><code class="language-ini"># Only specific IPs
mynetworks = 127.0.0.1, 10.10.3.125

# Or specific subnet
mynetworks = 10.10.0.0/21
</code></pre>
<hr />
<h3>Rate Limiting</h3>
<p><strong>Prevent abuse:</strong></p>
<pre><code class="language-bash">sudo vim /etc/postfix/main.cf
</code></pre>
<pre><code class="language-ini"># Max 100 connections/min per client
smtpd_client_connection_rate_limit = 100

# Max 100 emails/min per client
smtpd_client_message_rate_limit = 100
</code></pre>
<hr />
<h3>Monitor IAM Usage</h3>
<p><strong>Enable CloudTrail for audit:</strong></p>
<pre><code class="language-bash">aws cloudtrail create-trail \
  --name email-infrastructure-audit \
  --s3-bucket-name my-audit-logs
</code></pre>
<hr />
<h2>Resources</h2>
<p><strong>AWS Documentation:</strong></p>
<ul>
<li><p><a href="https://docs.aws.amazon.com/ses/">SES Developer Guide</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/cloudwatch/">CloudWatch Logs</a></p>
</li>
</ul>
<p><strong>Postfix:</strong></p>
<ul>
<li><p><a href="http://www.postfix.org/documentation.html">Official Docs</a></p>
</li>
<li><p><a href="http://www.postfix.org/DEBUG_README.html">Troubleshooting Guide</a></p>
</li>
</ul>
<hr />
<p><strong>Series Complete! 🎉</strong></p>
<ul>
<li><p><a href="https://tech.cyrilsebastian.com/building-production-email-infrastructure-with-postfix-aws-ses-architecture-design">Part 1: Architecture &amp; Design</a></p>
</li>
<li><p><a href="https://tech.cyrilsebastian.com/from-zero-to-production-building-postfix-aws-ses-in-2-hours">Part 2: Implementation Guide</a></p>
</li>
<li><p><strong>Part 3: Operations</strong> ← . You just finished this</p>
</li>
</ul>
<p>🔗 <strong>If this helped or resonated with you, connect with me on</strong> <a href="https://www.linkedin.com/in/sebastiancyril/"><strong>LinkedIn</strong></a>. Let’s learn and grow together.</p>
<p>👉 Stay tuned for more behind-the-scenes write-ups and system design breakdowns.</p>
<hr />
]]></content:encoded></item><item><title><![CDATA[From Zero to Production: Building Postfix + AWS SES in 2 Hours]]></title><description><![CDATA[What You Need Before Starting

AWS account with SES access

EC2 instance (t3a.medium, Amazon Linux 2023)

Your domain's DNS access

2 hours of focused time


That's it. Everything else, we'll build to]]></description><link>https://tech.cyrilsebastian.com/from-zero-to-production-building-postfix-aws-ses-in-2-hours</link><guid isPermaLink="true">https://tech.cyrilsebastian.com/from-zero-to-production-building-postfix-aws-ses-in-2-hours</guid><category><![CDATA[AWS]]></category><category><![CDATA[SES]]></category><category><![CDATA[email infrastructure]]></category><category><![CDATA[smtp]]></category><category><![CDATA[Devops]]></category><category><![CDATA[postfix]]></category><dc:creator><![CDATA[Cyril Sebastian]]></dc:creator><pubDate>Sat, 14 Mar 2026 04:54:00 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/66fff73c46655eee7fdfb5b0/8bd9c06f-53fe-4513-b8a3-e3d417d4727f.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>What You Need Before Starting</h2>
<ul>
<li><p>AWS account with SES access</p>
</li>
<li><p>EC2 instance (t3a.medium, Amazon Linux 2023)</p>
</li>
<li><p>Your domain's DNS access</p>
</li>
<li><p>2 hours of focused time</p>
</li>
</ul>
<p>That's it. Everything else, we'll build together.</p>
<hr />
<h2>Phase 1: AWS SES Setup (15 mins)</h2>
<h3>Step 1: Verify Your Domain</h3>
<p>AWS Console → SES → Verified Identities → <strong>Create Identity</strong></p>
<pre><code class="language-plaintext">Identity type: Domain
Domain: yourdomain.com
</code></pre>
<p>Add the TXT record SES provides to your DNS:</p>
<pre><code class="language-plaintext">Type: TXT
Name: _amazonses.yourdomain.com
Value: [provided by SES]
</code></pre>
<p><strong>Verify it worked:</strong></p>
<pre><code class="language-bash">aws ses get-identity-verification-attributes \
  --identities yourdomain.com \
  --region ap-south-1
</code></pre>
<p>Look for <code>"VerificationStatus": "Success"</code></p>
<hr />
<h3>Step 2: Configure DKIM (5 mins)</h3>
<p>In SES Console:</p>
<ol>
<li><p>Click your domain → DKIM tab → Edit</p>
</li>
<li><p>Enable <strong>Easy DKIM</strong> → Save</p>
</li>
</ol>
<p>Add the <strong>3 CNAME records</strong> SES provides to your DNS.</p>
<p><strong>Verify:</strong></p>
<pre><code class="language-bash">aws ses get-identity-dkim-attributes \
  --identities yourdomain.com \
  --region ap-south-1
</code></pre>
<p>Should show <code>"DkimEnabled": true</code></p>
<hr />
<h3>Step 3: SPF + DMARC (3 mins)</h3>
<p><strong>Add SPF record:</strong></p>
<pre><code class="language-plaintext">Type: TXT
Name: yourdomain.com
Value: "v=spf1 include:amazonses.com ~all"
</code></pre>
<p><strong>Add DMARC record:</strong></p>
<pre><code class="language-plaintext">Type: TXT  
Name: _dmarc.yourdomain.com
Value: "v=DMARC1; p=quarantine; rua=mailto:dmarc@yourdomain.com"
</code></pre>
<hr />
<h3>Step 4: Get SMTP Credentials (2 mins)</h3>
<p>SES Console → SMTP Settings → <strong>Create SMTP Credentials</strong></p>
<p><strong>Save these immediately</strong> (you won't see them again):</p>
<pre><code class="language-plaintext">Username: AKAWSSAMPLEEXAMPLE
Password: wJalrXUtnuTde/EXAMPLE
</code></pre>
<hr />
<h2>Phase 2: Postfix Setup (20 mins)</h2>
<p>SSH into your server and let's configure Postfix.</p>
<h3>Install Postfix</h3>
<pre><code class="language-bash">sudo yum update -y
sudo yum install -y postfix cyrus-sasl-plain mailx
sudo systemctl enable postfix
</code></pre>
<hr />
<h3>Configure SES Credentials</h3>
<pre><code class="language-bash">sudo vim /etc/postfix/sasl_passwd
</code></pre>
<p>Add this line (use YOUR credentials):</p>
<pre><code class="language-plaintext">[email-smtp.ap-south-1.amazonaws.com]:587 YOUR_USERNAME:YOUR_PASSWORD
</code></pre>
<p>Secure it:</p>
<pre><code class="language-bash">sudo chmod 600 /etc/postfix/sasl_passwd
sudo postmap /etc/postfix/sasl_passwd
</code></pre>
<hr />
<h3>Configure Postfix Main Settings</h3>
<pre><code class="language-bash">sudo vim /etc/postfix/main.cf
</code></pre>
<p><strong>Replace entire contents with:</strong></p>
<pre><code class="language-ini"># Basic Settings
myhostname = mail.yourdomain.com
mydomain = yourdomain.com
myorigin = $mydomain
inet_interfaces = all
mynetworks = 127.0.0.0/8, 10.0.0.0/16
mydestination = 

# AWS SES Relay
relayhost = [email-smtp.ap-south-1.amazonaws.com]:587
smtp_sasl_auth_enable = yes
smtp_sasl_security_options = noanonymous
smtp_sasl_password_maps = hash:/etc/postfix/sasl_passwd

# TLS Security
smtp_use_tls = yes
smtp_tls_security_level = encrypt
smtp_tls_CAfile = /etc/ssl/certs/ca-bundle.crt

# Sender Validation
smtpd_sender_restrictions =
    check_sender_access hash:/etc/postfix/allowed_senders,
    reject

smtpd_recipient_restrictions =
    permit_mynetworks,
    reject_unauth_destination

# Logging
maillog_file = /var/log/postfix/postfix.log
</code></pre>
<hr />
<h3>Create Sender Whitelist</h3>
<pre><code class="language-bash">sudo vim /etc/postfix/allowed_senders
</code></pre>
<p>Add approved senders:</p>
<pre><code class="language-plaintext">info@yourdomain.com         OK
noreply@yourdomain.com      OK
@yourdomain.com             REJECT Not authorized
</code></pre>
<p>Compile and setup:</p>
<pre><code class="language-bash">sudo postmap /etc/postfix/allowed_senders
sudo mkdir -p /var/log/postfix
sudo chown postfix:postfix /var/log/postfix
</code></pre>
<hr />
<h3>Start Postfix</h3>
<pre><code class="language-bash">sudo postfix check  # Should output nothing
sudo systemctl start postfix
sudo systemctl status postfix
</code></pre>
<p><strong>Test it:</strong></p>
<pre><code class="language-bash">echo "Test" | mail -s "Test Email" -r info@yourdomain.com your-email@example.com
sudo tail -f /var/log/postfix/postfix.log
</code></pre>
<p>Look for <code>status=sent (250 Ok...)</code> ✅</p>
<hr />
<h2>Phase 3: Event Pipeline (25 mins)</h2>
<p>This is where we set up bounce/delivery tracking.</p>
<h3>Create IAM Role</h3>
<pre><code class="language-bash"># Trust policy
cat trust-policy.json
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"Service": "ec2.amazonaws.com"},
    "Action": "sts:AssumeRole"
  }]
}


# Create role
aws iam create-role \
  --role-name PostfixSESLogger \
  --assume-role-policy-document file://trust-policy.json

# Permissions policy
cat policy.json 
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Action": [
      "sqs:ReceiveMessage",
      "sqs:DeleteMessage",
      "sqs:GetQueueUrl"
    ],
    "Resource": "arn:aws:sqs:ap-south-1:*:ses-events-queue"
  }]
}


# Attach policy
aws iam put-role-policy \
  --role-name PostfixSESLogger \
  --policy-name SESLogging \
  --policy-document file:///policy.json

# Create instance profile
aws iam create-instance-profile --instance-profile-name PostfixSESLogger
aws iam add-role-to-instance-profile \
  --instance-profile-name PostfixSESLogger \
  --role-name PostfixSESLogger

# Attach to instance
INSTANCE_ID=$(curl -s http://169.254.169.254/latest/meta-data/instance-id)
aws ec2 associate-iam-instance-profile \
  --instance-id $INSTANCE_ID \
  --iam-instance-profile Name=PostfixSESLogger
</code></pre>
<p>Wait 10 seconds, then verify:</p>
<pre><code class="language-bash">aws sts get-caller-identity  # Should show the role
</code></pre>
<hr />
<h3>Create SQS Queue</h3>
<pre><code class="language-bash">QUEUE_URL=$(aws sqs create-queue \
  --queue-name ses-events-queue \
  --region ap-south-1 \
  --query 'QueueUrl' \
  --output text)

QUEUE_ARN=$(aws sqs get-queue-attributes \
  --queue-url "$QUEUE_URL" \
  --attribute-names QueueArn \
  --region ap-south-1 \
  --query 'Attributes.QueueArn' \
  --output text)

echo "Queue URL: $QUEUE_URL"
echo "Queue ARN: $QUEUE_ARN"
</code></pre>
<hr />
<h3>Create SNS Topic and Subscribe SQS</h3>
<pre><code class="language-bash">SNS_ARN=$(aws sns create-topic \
  --name ses-events-topic \
  --region ap-south-1 \
  --query 'TopicArn' \
  --output text)

# Allow SNS to send to SQS
cat sqs-policy.json
{
  "Version": "2012-10-17",
  "Statement": [{
    "Effect": "Allow",
    "Principal": {"Service": "sns.amazonaws.com"},
    "Action": "sqs:SendMessage",
    "Resource": "$QUEUE_ARN",
    "Condition": {"ArnEquals": {"aws:SourceArn": "$SNS_ARN"}}
  }]
}


aws sqs set-queue-attributes \
  --queue-url "$QUEUE_URL" \
  --attributes Policy="$(cat /tmp/sqs-policy.json)" \
  --region ap-south-1

# Subscribe SQS to SNS
aws sns subscribe \
  --topic-arn "$SNS_ARN" \
  --protocol sqs \
  --notification-endpoint "$QUEUE_ARN" \
  --region ap-south-1
</code></pre>
<hr />
<h3>Configure SES to Publish Events</h3>
<pre><code class="language-bash">for EVENT in Delivery Bounce Complaint; do
  aws ses set-identity-notification-topic \
    --identity yourdomain.com \
    --notification-type $EVENT \
    --sns-topic "$SNS_ARN" \
    --region ap-south-1
done

# Disable email forwarding
aws ses set-identity-feedback-forwarding-enabled \
  --identity yourdomain.com \
  --no-forwarding-enabled \
  --region ap-south-1
</code></pre>
<hr />
<h2>Phase 4: Logger Deployment (30 mins)</h2>
<h3>Install Dependencies</h3>
<pre><code class="language-bash">sudo yum install -y python3-boto3
python3 -c "import boto3; print('✓ boto3 installed')"
</code></pre>
<hr />
<h3>Create Logger Script</h3>
<pre><code class="language-bash">sudo nano /usr/local/bin/ses_logger.py
</code></pre>
<p><strong>Paste this complete script:</strong></p>
<pre><code class="language-python">#!/usr/bin/env python3
import boto3, json, syslog, os, sys
from datetime import datetime

REGION = 'ap-south-1'
syslog.openlog('postfix/ses-events', logoption=syslog.LOG_PID, facility=syslog.LOG_MAIL)

def log_event(msg_id, event_type, recipient, details):
    log = f"{msg_id}: to=&lt;{recipient}&gt;, relay=amazonses.com, {details}"
    level = syslog.LOG_WARNING if event_type == "Bounce" else syslog.LOG_INFO
    syslog.syslog(level, log)
    print(f"[{datetime.now():%Y-%m-%d %H:%M:%S}] {event_type}: {log}")

def process_event(message):
    try:
        event = json.loads(message)
        event_type = event.get('notificationType')
        mail = event.get('mail', {})
        msg_id = mail.get('messageId', 'UNKNOWN')
        
        if event_type == 'Delivery':
            delivery = event.get('delivery', {})
            for recipient in delivery.get('recipients', []):
                delay = delivery.get('processingTimeMillis', 0)
                details = f"dsn=2.0.0, status=delivered, delay={delay}ms"
                log_event(msg_id, event_type, recipient, details)
        
        elif event_type == 'Bounce':
            bounce = event.get('bounce', {})
            for r in bounce.get('bouncedRecipients', []):
                details = f"dsn=5.0.0, status=bounced, type={bounce.get('bounceType')}"
                log_event(msg_id, event_type, r.get('emailAddress'), details)
        
        return True
    except Exception as e:
        print(f"Error: {e}", file=sys.stderr)
        return False

def main():
    queue_url = os.environ.get('SQS_QUEUE_URL')
    if not queue_url:
        sys.exit("ERROR: SQS_QUEUE_URL not set")
    
    print(f"SES Logger Started\nQueue: {queue_url}\n")
    sqs = boto3.client('sqs', region_name=REGION)
    
    while True:
        try:
            response = sqs.receive_message(
                QueueUrl=queue_url,
                MaxNumberOfMessages=10,
                WaitTimeSeconds=20
            )
            
            for message in response.get('Messages', []):
                body = json.loads(message['Body'])
                if process_event(body.get('Message')):
                    sqs.delete_message(
                        QueueUrl=queue_url,
                        ReceiptHandle=message['ReceiptHandle']
                    )
        except KeyboardInterrupt:
            break
        except Exception as e:
            print(f"Error: {e}", file=sys.stderr)

if __name__ == '__main__':
    main()
</code></pre>
<p>Make executable:</p>
<pre><code class="language-bash">sudo chmod +x /usr/local/bin/ses_logger.py
</code></pre>
<hr />
<h3>Create Systemd Service</h3>
<pre><code class="language-bash">QUEUE_URL=$(aws sqs get-queue-url --queue-name ses-events-queue --region ap-south-1 --query 'QueueUrl' --output text)

sudo vim /etc/systemd/system/ses-logger.service
[Unit]
Description=SES Event Logger
After=network.target

[Service]
Type=simple
User=root
Environment="SQS_QUEUE_URL=$QUEUE_URL"
Environment="AWS_DEFAULT_REGION=ap-south-1"
ExecStart=/usr/bin/python3 /usr/local/bin/ses_logger.py
Restart=always
RestartSec=10
StandardOutput=append:/var/log/postfix/ses-logger.log
StandardError=append:/var/log/postfix/ses-logger-error.log

[Install]
WantedBy=multi-user.target
</code></pre>
<hr />
<h3>Start Logger</h3>
<pre><code class="language-bash">sudo systemctl daemon-reload
sudo systemctl enable ses-logger
sudo systemctl start ses-logger
sudo systemctl status ses-logger
</code></pre>
<p>Should show <code>Active: active (running)</code> ✅</p>
<p><strong>View logs:</strong></p>
<pre><code class="language-bash">sudo tail -f /var/log/postfix/ses-logger.log
</code></pre>
<hr />
<h2>Phase 5: Testing (20 mins)</h2>
<h3>Test 1: Complete Flow</h3>
<p><strong>Send test email:</strong></p>
<pre><code class="language-bash">echo "Test from infrastructure" | \
  mail -s "Test Email" -r info@yourdomain.com your-email@example.com
</code></pre>
<p><strong>Watch both logs:</strong></p>
<pre><code class="language-bash"># Terminal 1 - Sent status (immediate)
sudo tail -f /var/log/postfix/postfix.log | grep "status=sent"

# Terminal 2 - Delivered status (10-30 sec delay)
sudo tail -f /var/log/postfix/mail.log | grep "status=delivered"
</code></pre>
<p><strong>Expected:</strong></p>
<pre><code class="language-plaintext"># Postfix log (immediate):
status=sent (250 Ok 0109019c...)

# Mail log (after 10-30 seconds):
status=delivered, delay=3558ms
</code></pre>
<p>✅ <strong>Both statuses visible? Success!</strong></p>
<hr />
<h3>Test 2: Bounce Detection</h3>
<pre><code class="language-bash"># Use SES bounce simulator
echo "Bounce test" | \
  mail -s "Bounce Test" -r info@yourdomain.com bounce@simulator.amazonses.com

# Watch for bounce (1-2 mins)
sudo tail -f /var/log/postfix/mail.log | grep "bounced"
</code></pre>
<p>Expected: <code>status=bounced, type=Permanent</code></p>
<hr />
<h3>Test 3: Sender Validation</h3>
<pre><code class="language-bash"># Try unauthorized sender
echo "Should fail" | \
  mail -s "Test" -r unauthorized@yourdomain.com test@example.com

# Check rejection
sudo tail /var/log/postfix/postfix.log | grep reject
</code></pre>
<p>Expected: <code>Sender address rejected: Access denied</code></p>
<hr />
<h2>What You Built</h2>
<p>In 2 hours, you created:</p>
<p>✅ <strong>Postfix SMTP relay</strong> with sender validation<br />✅ <strong>AWS SES integration</strong> with DKIM/SPF/DMARC<br />✅ <strong>Real-time tracking</strong> for delivery and bounces<br />✅ <strong>Unified logging</strong> - both "sent" and "delivered" in one place<br />✅ <strong>Cost-effective</strong> - ~\(30/month vs \)90+ for SaaS</p>
<hr />
<h2>Quick Reference</h2>
<p><strong>Restart services:</strong></p>
<pre><code class="language-bash">sudo systemctl restart postfix
sudo systemctl restart ses-logger
</code></pre>
<p><strong>View logs:</strong></p>
<pre><code class="language-bash">sudo tail -f /var/log/postfix/postfix.log  # Sent
sudo tail -f /var/log/postfix/mail.log     # Delivered
</code></pre>
<p><strong>Check queue:</strong></p>
<pre><code class="language-bash">mailq
</code></pre>
<p><strong>Search for email:</strong></p>
<pre><code class="language-bash">grep "user@example.com" /var/log/postfix/*.log
</code></pre>
<hr />
<h2>Common Issues</h2>
<p><strong>Email stuck in queue?</strong></p>
<pre><code class="language-bash"># Check why
sudo tail -50 /var/log/postfix/postfix.log | grep deferred
# Flush after fixing
sudo postqueue -f
</code></pre>
<p><strong>Logger not running?</strong></p>
<pre><code class="language-bash"># Check errors
sudo journalctl -u ses-logger -n 50
# Restart
sudo systemctl restart ses-logger
</code></pre>
<p><strong>No delivery events?</strong></p>
<pre><code class="language-bash"># Check SQS has messages
aws sqs get-queue-attributes \
  --queue-url "YOUR_QUEUE_URL" \
  --attribute-names ApproximateNumberOfMessages
</code></pre>
<hr />
<h2>What's Next?</h2>
<p>Read <strong>Part 3: Operations &amp; Troubleshooting</strong></p>
<p>🔗 <strong>If this helped or resonated with you, connect with me on</strong> <a href="https://www.linkedin.com/in/sebastiancyril/"><strong>LinkedIn</strong></a>. Let’s learn and grow together.</p>
<p>👉 Stay tuned for more behind-the-scenes write-ups and system design breakdowns.</p>
]]></content:encoded></item><item><title><![CDATA[Building Production Email Infrastructure with Postfix + AWS SES: Architecture & Design]]></title><description><![CDATA[Part 1: Architecture & Design Decisions

Series NavigationPart 1: Architecture & Design ← You are herePart 2: Implementation GuidePart 3: Operations & Troubleshooting


TL;DR
What we're building: A pr]]></description><link>https://tech.cyrilsebastian.com/building-production-email-infrastructure-with-postfix-aws-ses-architecture-design</link><guid isPermaLink="true">https://tech.cyrilsebastian.com/building-production-email-infrastructure-with-postfix-aws-ses-architecture-design</guid><category><![CDATA[AWS]]></category><category><![CDATA[SES]]></category><category><![CDATA[postfix]]></category><category><![CDATA[email infrastructure]]></category><category><![CDATA[Devops]]></category><category><![CDATA[cloud architecture]]></category><category><![CDATA[System Design]]></category><dc:creator><![CDATA[Cyril Sebastian]]></dc:creator><pubDate>Sat, 07 Mar 2026 13:32:25 GMT</pubDate><enclosure url="https://cdn.hashnode.com/uploads/covers/66fff73c46655eee7fdfb5b0/ec9570ac-ca51-4801-8f8f-213b0ce4a9f3.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<h2>Part 1: Architecture &amp; Design Decisions</h2>
<blockquote>
<p><strong>Series Navigation</strong><br /><strong>Part 1: Architecture &amp; Design</strong> ← You are here<br />Part 2: Implementation Guide<br />Part 3: Operations &amp; Troubleshooting</p>
</blockquote>
<hr />
<h2>TL;DR</h2>
<p><strong>What we're building:</strong> A production email system combining Postfix SMTP relay with AWS SES, complete with real-time bounce tracking, all visible in unified logs.</p>
<p><strong>Why it matters:</strong> Track emails from send → delivery → bounce in one log file, works in private subnets, costs ~$30/month.</p>
<p><strong>Who it's for:</strong> DevOps engineers, backend developers, and SREs managing email infrastructure.</p>
<hr />
<h2>Introduction</h2>
<p>Have you ever sent an email and wondered: "Did it actually reach the inbox? Or did it bounce? When?"</p>
<p>Traditional SMTP relays tell you when they <em>sent</em> the email, but not when it was <em>delivered</em>. This gap creates a blind spot in your infrastructure.</p>
<h3>What We're Building</h3>
<p>An email infrastructure that provides:</p>
<ul>
<li><p><strong>Complete visibility</strong>: Both "sent" and "delivered" statuses</p>
</li>
<li><p><strong>Real-time bounce tracking</strong>: Know immediately when emails fail</p>
</li>
<li><p><strong>Unified logging</strong>: Everything in one log file (grep-friendly!)</p>
</li>
<li><p><strong>Private subnet compatible</strong>: No public endpoints needed</p>
</li>
<li><p><strong>Cost-effective</strong>: ~$30/month for 50,000 emails</p>
</li>
<li><p><strong>Enterprise deliverability</strong>: AWS SES's 99.9% delivery rate</p>
</li>
</ul>
<h3>Who This Series Is For</h3>
<p><strong>DevOps Engineers</strong> looking to build a reliable email infrastructure<br /><strong>Backend Developers</strong> integrating email into applications<br /><strong>SREs</strong> need observability into email delivery</p>
<h3>Series Overview</h3>
<p><strong>Part 1</strong> (this post): Understand the architecture and design decisions<br /><strong>Part 2</strong>: Step-by-step implementation guide<br /><strong>Part 3</strong>: Operations, monitoring, and troubleshooting</p>
<hr />
<h2>The Problem: Email Observability</h2>
<h3>Traditional SMTP Relay Limitations</h3>
<p>When you send an email through a standard SMTP relay:</p>
<pre><code class="language-log">postfix: status=sent (250 Ok)
</code></pre>
<p><strong>But "sent" doesn't mean "delivered"!</strong> It just means your mail server handed the email to the next server.</p>
<h4>What You Don't Know</h4>
<p>- Did it reach the recipient's inbox?<br />- Did it bounce?<br />- Was it marked as spam?<br />- How long did the delivery take?<br />- Which ISP was slow?</p>
<p>This creates a <strong>visibility gap</strong> in your infrastructure.</p>
<h3>Solution: The Best of Both Worlds</h3>
<pre><code class="language-plaintext">Your App → Postfix → SES → Recipient
     ↓         ↓        ↓
   Logs    Logs    SNS→SQS→Logger
                         ↓
                  Unified Logs ✅
</code></pre>
<p><strong>This approach delivers:</strong></p>
<p><strong>Both statuses</strong>: "sent" (Postfix) AND "delivered" (SES)<br /><strong>Private subnet</strong>: No public endpoints required<br /><strong>Cost-effective</strong>: ~$30/month<br /><strong>Standard SMTP</strong>: No code changes needed<br /><strong>Unix-friendly</strong>: Logs searchable with grep/awk<br /><strong>Full control</strong>: Own your infrastructure</p>
<hr />
<h2>Architecture Overview</h2>
<h3>The Big Picture</h3>
<pre><code class="language-plaintext">╔══════════════════════════════════╗
║     Application Layer            ║
║  "Send email to user@example.com"║
╚════════════╤═════════════════════╝
             │ SMTP • Port 25
             ▼
╔══════════════════════════════════╗
║      Postfix (Private Subnet)    ║
║  ├─ ✓ Sender authorized          ║
║  ├─ ➡ Forwarding to SES...       ║
║  └─ 📋 LOG: status=sent          ║
╚════════════╤═════════════════════╝
             │ SMTP + TLS • Port 587
             ▼
╔══════════════════════════════════╗
║     AWS Simple Email Service    ║
║  "Delivering to recipient..."    ║
╚════════════╤═════════════════════╝
             │
     ┌───────┴───────┐
     ▼               ▼
╔════════════╗ ╔════════════════════╗
║           ║ ║    Event Flow     ║
║  Recipient ║ ║  SNS → SQS → Logger║
║  Mail Server║╚══════════╤═════════╝
╚════════════╝            │
                          ▼
                 ╔════════════════════╗
                ║    Your Logs ✓    ║
                ║  ├─   sent        ║
                ║  ├─   delivered   ║
                ║  └─    bounced     ║
                 ╚════════════════════╝
</code></pre>
<h3>Data Flow Summary</h3>
<ol>
<li><p><strong>Application</strong> sends email to Postfix via SMTP</p>
</li>
<li><p><strong>Postfix</strong> validates sender, relays to SES</p>
</li>
<li><p><strong>SES</strong> delivers email to recipient</p>
</li>
<li><p><strong>SES</strong> publishes event (delivery/bounce) to SNS</p>
</li>
<li><p><strong>SNS</strong> forwards event to SQS queue</p>
</li>
<li><p><strong>Python logger</strong> polls SQS, writes to syslog</p>
</li>
<li><p><strong>Result</strong>: Both "sent" and "delivered" in your logs!</p>
</li>
</ol>
<hr />
<h2>Core Components Explained</h2>
<h3>1. Postfix: The Smart Relay</h3>
<p><strong>Role:</strong> SMTP relay with sender validation and forwarding logic</p>
<p><strong>What it does:</strong></p>
<ul>
<li><p>Receives emails from your application</p>
</li>
<li><p>Validates sender addresses against the whitelist</p>
</li>
<li><p>Forwards to AWS SES via authenticated SMTP</p>
</li>
<li><p>Logs "sent" status immediately</p>
</li>
</ul>
<p><strong>Why Postfix?</strong></p>
<p><strong>Industry standard</strong>: Powers millions of servers<br /><strong>Highly configurable</strong>: Fine-grained control over routing<br /><strong>Excellent logging</strong>: Detailed, parseable log format<br /><strong>Battle-tested</strong>: Decades of production use<br /><strong>Performance</strong>: Handles thousands of concurrent connections</p>
<p><strong>Key Configuration:</strong></p>
<pre><code class="language-ini"># Sender validation (whitelist)
smtpd_sender_restrictions = 
    check_sender_access hash:/etc/postfix/allowed_senders,
    reject

# Only approved senders can use relay
# Example whitelist:
# info@example.com    OK
# noreply@example.com OK
# @example.com        REJECT
</code></pre>
<p><strong>Log Output Example:</strong></p>
<pre><code class="language-log">Feb 25 00:05:15 mail postfix/smtp[123]: ABC123: 
  to=&lt;user@example.com&gt;, 
  relay=email-smtp.ap-south-1.amazonaws.com:587, 
  delay=0.14, 
  dsn=2.0.0, 
  status=sent (250 Ok 0109019c...)
</code></pre>
<p><strong>What this tells you:</strong></p>
<ul>
<li><p>Postfix accepted the email</p>
</li>
<li><p>Email forwarded to SES</p>
</li>
<li><p>SES accepted the email (250 Ok)</p>
</li>
<li><p>Took 0.14 seconds</p>
</li>
</ul>
<hr />
<h3>2. AWS SES: The Delivery Engine</h3>
<p><strong>Role:</strong> Actual email delivery to recipients</p>
<p><strong>What it does:</strong></p>
<ul>
<li><p>Delivers emails to recipient mail servers</p>
</li>
<li><p>Handles DKIM signing for authentication</p>
</li>
<li><p>Manages IP reputation</p>
</li>
<li><p>Publishes delivery/bounce events</p>
</li>
</ul>
<p><strong>Why SES?</strong></p>
<p><strong>99.9% deliverability</strong>: Enterprise-grade infrastructure<br /><strong>Global reach</strong>: AWS's worldwide network<br /><strong>Pay-as-you-go</strong>: $0.10 per 1,000 emails<br /><strong>Built-in auth</strong>: Automatic SPF, DKIM, DMARC<br /><strong>Event publishing</strong>: Real-time delivery notifications<br /><strong>Scalable</strong>: From 100 to millions of emails</p>
<p><strong>Event Types:</strong></p>
<ol>
<li><p><strong>Delivery</strong> - Email reached the recipient's inbox</p>
</li>
<li><p><strong>Bounce</strong> - Email rejected (permanent or temporary)</p>
</li>
<li><p><strong>Complaint</strong> - Recipient marked as spam</p>
</li>
</ol>
<p><strong>Why not use SES directly?</strong></p>
<p>While SES has an API, using Postfix as a relay provides:</p>
<ul>
<li><p>Standard SMTP interface (no code changes)</p>
</li>
<li><p>Sender validation</p>
</li>
<li><p>Easy provider switching</p>
</li>
<li><p>Centralized configuration</p>
</li>
<li><p>Better logging</p>
</li>
</ul>
<hr />
<h3>3. The Event Pipeline: SNS → SQS → Logger</h3>
<p>This is the secret sauce that brings delivery confirmations into your logs.</p>
<h4>SNS (Simple Notification Service)</h4>
<p><strong>Role:</strong> Event broadcaster from SES</p>
<p><strong>Flow:</strong></p>
<pre><code class="language-plaintext">SES delivers email
    ↓
SES publishes event to SNS
    ↓
SNS broadcasts to subscribers
</code></pre>
<p><strong>Why SNS?</strong></p>
<ul>
<li><p>Real-time notifications</p>
</li>
<li><p>Fan-out to multiple destinations</p>
</li>
<li><p>Native SES integration</p>
</li>
<li><p>Filter by event type</p>
</li>
</ul>
<h4>SQS (Simple Queue Service)</h4>
<p><strong>Role:</strong> Message buffer between SNS and logger</p>
<p><strong>Sample Event:</strong></p>
<pre><code class="language-json">{
  "notificationType": "Delivery",
  "mail": {
    "messageId": "0109019c...",
    "destination": ["user@example.com"]
  },
  "delivery": {
    "timestamp": "2026-02-25T00:05:18.000Z",
    "smtpResponse": "250 ok dirdel",
    "processingTimeMillis": 3558
  }
}
</code></pre>
<p><strong>Why SQS instead of HTTP webhooks?</strong></p>
<p>This is a <strong>critical design decision</strong>:</p>
<p><strong>HTTP Webhook Approach:</strong></p>
<pre><code class="language-plaintext">SES → SNS → HTTP POST to your server
                      ↓
                Need public endpoint
                Need ALB/Load balancer  
                Security concerns
                Webhook authentication
</code></pre>
<p><strong>SQS Polling Approach:</strong></p>
<pre><code class="language-plaintext">SES → SNS → SQS Queue
              ↓
      Python script polls (outbound only)
              ↓
        No public endpoint needed!
        Works in private subnet
        Messages buffered if logger down
        IAM-based authentication
</code></pre>
<p><strong>Benefits of SQS:</strong></p>
<p><strong>Private subnet compatible</strong>: Polling is outbound-only<br /><strong>Resilient</strong>: Messages buffered for 14 days<br /><strong>No ALB needed</strong>: Saves $18/month<br /><strong>More reliable</strong>: No missed webhooks<br /><strong>IAM auth</strong>: No webhook secrets to manage</p>
<hr />
<h3>4. The Logger: Python + Syslog</h3>
<p><strong>Role:</strong> Poll SQS and write events to Postfix logs</p>
<p><strong>What it does:</strong></p>
<ul>
<li><p>Polls SQS every 20 seconds (long polling)</p>
</li>
<li><p>Parses SES delivery/bounce events</p>
</li>
<li><p>Writes to syslog (same facility as Postfix)</p>
</li>
<li><p>Deletes processed messages from the queue</p>
</li>
</ul>
<p><strong>Code Snippet:</strong></p>
<pre><code class="language-python">import boto3, syslog

# Initialize SQS client (uses IAM role automatically)
sqs = boto3.client('sqs', region_name='ap-south-1')

# Poll queue (long polling reduces API calls)
response = sqs.receive_message(
    QueueUrl=queue_url,
    MaxNumberOfMessages=10,
    WaitTimeSeconds=20  # Wait up to 20s for messages
)

# Process each event
for message in response.get('Messages', []):
    event = parse_ses_event(message)
    
    # Write to syslog (appears in Postfix logs!)
    syslog.openlog('postfix/ses-events', 
                   facility=syslog.LOG_MAIL)
    syslog.syslog(syslog.LOG_INFO, 
                  f"{msg_id}: to=&lt;{recipient}&gt;, "
                  f"status=delivered")
    
    # Remove from queue
    sqs.delete_message(...)
</code></pre>
<p><strong>Why Syslog?</strong></p>
<p><strong>Same log file</strong>: Appears alongside Postfix logs<br /><strong>Auto-rotation</strong>: System handles log management<br /><strong>Searchable</strong>: Standard Unix tools (grep, awk)<br /><strong>Integration</strong>: Works with existing log aggregators<br /><strong>Familiar format</strong>: Same as Postfix log entries</p>
<p><strong>Log Output:</strong></p>
<pre><code class="language-log">Feb 25 00:05:19 mail postfix/ses-events[456]: 0109019c...: 
  to=&lt;user@example.com&gt;, 
  relay=amazonses.com, 
  dsn=2.0.0, 
  status=delivered, 
  delay=3607ms,
  response=(250 ok dirdel)
</code></pre>
<p><strong>What this tells you:</strong></p>
<ul>
<li><p>Email successfully delivered</p>
</li>
<li><p>Took 3.6 seconds from SES to the inbox</p>
</li>
<li><p>Final SMTP response from recipient server</p>
</li>
</ul>
<hr />
<h2>The Complete Email Journey</h2>
<p>Let's trace a single email through the entire system with precise timings.</p>
<h3>T+0ms: Application Sends Email</h3>
<pre><code class="language-python">import smtplib
from email.mime.text import MIMEText

msg = MIMEText("Hello World")
msg['From'] = "info@example.com"
msg['To'] = "user@example.com"

s = smtplib.SMTP("10.0.0.23", 25)
s.sendmail("info@example.com", "user@example.com", msg.as_string())
s.quit()
</code></pre>
<p><strong>What happens:</strong> SMTP connection to Postfix relay</p>
<hr />
<h3>T+10ms: Postfix Validates &amp; Queues</h3>
<p><strong>Checks performed:</strong></p>
<ol>
<li><p>Is sender in whitelist? (<code>info@example.com</code> → OK)</p>
</li>
<li><p>Is sender from allowed network? (<code>10.0.0.0/16</code> → OK)</p>
</li>
<li><p>Queue email for delivery</p>
</li>
</ol>
<p><strong>Log entries:</strong></p>
<pre><code class="language-log">postfix/smtpd[123]: connect from ip-10-10-3-125
postfix/smtpd[123]: ABC123: client=ip-10-0-0-125
postfix/cleanup[124]: ABC123: message-id=&lt;...&gt;
postfix/qmgr[125]: ABC123: from=&lt;info@example.com&gt;, size=432
</code></pre>
<hr />
<h3>T+150ms: Postfix → SES Relay</h3>
<p><strong>Process:</strong></p>
<ol>
<li><p>Establish TLS 1.3 connection to SES</p>
</li>
<li><p>Authenticate with SMTP credentials</p>
</li>
<li><p>Transmit email content</p>
</li>
<li><p>Receive confirmation</p>
</li>
</ol>
<p><strong>Log entry (FIRST "sent" status!):</strong></p>
<pre><code class="language-log">postfix/smtp[126]: Trusted TLS connection established to 
  email-smtp.ap-south-1.amazonaws.com:587: 
  TLSv1.3 with cipher TLS_AES_256_GCM_SHA384

postfix/smtp[126]: ABC123: to=&lt;user@example.com&gt;, 
  relay=email-smtp.ap-south-1.amazonaws.com:587, 
  delay=0.14, 
  status=sent (250 Ok 0109019c...)
</code></pre>
<p><strong>What you know at this point:</strong></p>
<ul>
<li><p>Email left your infrastructure</p>
</li>
<li><p>SES accepted the email</p>
</li>
<li><p>Total time: 150ms</p>
</li>
</ul>
<hr />
<h3>T+1500ms: SES Processes &amp; Delivers</h3>
<p><strong>SES internal process:</strong></p>
<ol>
<li><p>Add DKIM signature</p>
</li>
<li><p>Perform SPF/DMARC checks</p>
</li>
<li><p>Select optimal sending IP</p>
</li>
<li><p>Connect to the recipient's mail server</p>
</li>
<li><p>Deliver email</p>
</li>
<li><p>Receive final confirmation</p>
</li>
</ol>
<p><strong>This happens entirely within AWS</strong> - you don't see these steps</p>
<hr />
<h3>T+1550ms: SES Publishes Event to SNS</h3>
<p><strong>Event generated:</strong></p>
<pre><code class="language-json">{
  "notificationType": "Delivery",
  "mail": {
    "timestamp": "2026-02-25T00:05:15.140Z",
    "messageId": "0109019c...",
    "source": "info@example.com",
    "destination": ["user@example.com"]
  },
  "delivery": {
    "timestamp": "2026-02-25T00:05:18.698Z",
    "recipients": ["user@example.com"],
    "smtpResponse": "250 ok dirdel",
    "processingTimeMillis": 3558,
    "remoteMtaIp": "74.198.68.21",
    "reportingMTA": "a8-123.smtp-out.amazonses.com"
  }
}
</code></pre>
<p><strong>Published to SNS topic:</strong> <code>ses-events-topic</code></p>
<hr />
<h3>T+1600ms: SNS → SQS Forward</h3>
<p><strong>SNS wraps the event:</strong></p>
<pre><code class="language-json">{
  "Type": "Notification",
  "MessageId": "...",
  "TopicArn": "arn:aws:sns:ap-south-1:...:ses-events-topic",
  "Message": "{\"notificationType\":\"Delivery\",...}",
  "Timestamp": "2026-02-25T00:05:18.750Z"
}
</code></pre>
<p><strong>Delivered to SQS queue:</strong> <code>ses-events-queue</code></p>
<hr />
<h3>T+5000ms: Logger Polls &amp; Processes</h3>
<p><strong>Logger wakes up</strong> (polls every 20 seconds with long polling)</p>
<p><strong>Process:</strong></p>
<ol>
<li><p>Retrieve message from SQS</p>
</li>
<li><p>Parse SNS wrapper</p>
</li>
<li><p>Extract SES event</p>
</li>
<li><p>Format for syslog</p>
</li>
<li><p>Write to log file</p>
</li>
<li><p>Delete message from queue</p>
</li>
</ol>
<p><strong>Log entry (SECOND "delivered" status!):</strong></p>
<pre><code class="language-log">Feb 25 00:05:19 mail postfix/ses-events[456]: 0109019c...: 
  to=&lt;user@example.com&gt;, 
  relay=amazonses.com, 
  dsn=2.0.0, 
  status=delivered, 
  delay=3558ms,
  response=(250 ok dirdel)
</code></pre>
<p><strong>What you know now:</strong></p>
<ul>
<li><p>Email delivered to inbox</p>
</li>
<li><p>Delivery took 3.5 seconds</p>
</li>
<li><p>Final confirmation from Gmail</p>
</li>
</ul>
<hr />
<h3>Final Result: Unified Logs</h3>
<p><strong>Complete journey in logs:</strong></p>
<pre><code class="language-log"># T+150ms - Postfix → SES
Feb 25 00:05:15 mail postfix/smtp[126]: ABC123: 
  to=&lt;user@example.com&gt;, 
  status=sent (250 Ok)

# T+5000ms - SES → Inbox confirmed
Feb 25 00:05:19 mail postfix/ses-events[456]: 0109019c...: 
  to=&lt;user@example.com&gt;, 
  status=delivered, 
  delay=3558ms
</code></pre>
<p><strong>Search for any email:</strong></p>
<pre><code class="language-bash">grep "user@example.com" /var/log/postfix/*.log
</code></pre>
<p><strong>Output:</strong></p>
<pre><code class="language-plaintext">postfix.log:  status=sent (handed to SES)
mail.log:     status=delivered (reached inbox)
</code></pre>
<p><strong>Complete visibility from send to delivery!</strong></p>
<hr />
<h3>Why Private Subnet?</h3>
<p><strong>Security Benefits:</strong></p>
<p><strong>Reduced attack surface</strong>: No public IP = can't be scanned<br /><strong>No direct internet access</strong>: Blocks many attack vectors<br /><strong>Network-level isolation</strong>: Additional security layer<br /><strong>AWS best practice</strong>: Recommended architecture<br /><strong>Compliance-friendly</strong>: Easier to meet security requirements</p>
<p><strong>How it works:</strong></p>
<pre><code class="language-plaintext">┌─────────────────────────────────┐
│     Private Subnet             │
│                                 │
│  ┌──────────┐                  │
│  │ Postfix  │ (No public IP)   │
│  └────┬─────┘                  │
│       │ Outbound HTTPS only    │
└───────┼─────────────────────────┘
        │
        ↓ Via NAT Gateway
  ┌─────────────┐
  │   AWS SES   │
  │   AWS SQS   │
  └─────────────┘
</code></pre>
<p><strong>Key point:</strong> Both SES and SQS are "pull" services:</p>
<ul>
<li><p>Postfix initiates connection to SES (outbound)</p>
</li>
<li><p>Logger initiates connection to SQS (outbound)</p>
</li>
<li><p><strong>No inbound connections needed!</strong></p>
</li>
</ul>
<hr />
<p><strong>SQS Polling Benefits:</strong></p>
<pre><code class="language-plaintext">SES → SNS → SQS ← Python polls (outbound only)
</code></pre>
<p><strong>Simple infrastructure:</strong></p>
<ol>
<li><p>No public endpoints</p>
</li>
<li><p>No TLS cert management</p>
</li>
<li><p>No inbound firewall rules</p>
</li>
<li><p>IAM authentication (built-in)</p>
</li>
<li><p>Automatic retry (queue buffering)</p>
</li>
</ol>
<hr />
<h2>Security Architecture</h2>
<h3>Network Security</h3>
<p><strong>Multi-layer defense:</strong></p>
<pre><code class="language-plaintext">┌─────────────────────────────────────┐
│        Private Subnet               │
│  ┌──────────────────────────┐      │
│  │ Security Group           │      │
│  │ - Port 25: 10.0.0.0/21 │      │
│  │ - Port 22: Admin IPs     │      │
│  │ - Outbound: All          │      │
│  │                          │      │
│  │   ┌──────────┐          │      │
│  │   │ Postfix  │          │      │
│  │   │ (Private)│          │      │
│  │   └────┬─────┘          │      │
│  └────────┼─────────────────┘      │
└───────────┼─────────────────────────┘
            │ Outbound only
            ↓ (TLS 1.3)
      ┌─────────────┐
      │   AWS SES   │
      │  (Public)   │
      └─────────────┘
</code></pre>
<p><strong>Security controls:</strong></p>
<ol>
<li><p><strong>Network isolation</strong>: Private subnet, no public IP</p>
</li>
<li><p><strong>Sender validation</strong>: Whitelist checks before relay</p>
</li>
<li><p><strong>Encryption</strong>: TLS 1.3 to SES</p>
</li>
<li><p><strong>Authentication</strong>: SMTP credentials for SES</p>
</li>
</ol>
<hr />
<h2>Additional Resources</h2>
<h3>AWS Documentation</h3>
<ul>
<li><p><a href="https://docs.aws.amazon.com/ses/">Amazon SES Developer Guide</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/sqs/">Amazon SQS Documentation</a></p>
</li>
<li><p><a href="https://docs.aws.amazon.com/sns/">Amazon SNS Documentation</a></p>
</li>
</ul>
<h3>Postfix Resources</h3>
<ul>
<li><a href="http://www.postfix.org/documentation.html">Official Postfix Documentation</a></li>
</ul>
<h3>Email Authentication</h3>
<ul>
<li><a href="https://dmarc.org/">DMARC.org</a></li>
</ul>
<hr />
<h2>About This Series</h2>
<p>This is <strong>Part 1</strong> of a 3-part series on building production email infrastructure:</p>
<ul>
<li><p><strong>Part 1: Architecture &amp; Design</strong> ← You just read this</p>
</li>
<li><p><strong>Part 2: Implementation Guide</strong> - Step-by-step setup</p>
</li>
<li><p><strong>Part 3: Operations &amp; Troubleshooting</strong> - Day-to-day management</p>
</li>
</ul>
<hr />
<p><em>Next in series:</em> <em>Part 2: Implementation Guide</em>  </p>
<p>🔗 <strong>If this helped or resonated with you, connect with me on</strong> <a href="https://www.linkedin.com/in/sebastiancyril/"><strong>LinkedIn</strong></a>. Let’s learn and grow together.</p>
<p>👉 Stay tuned for more behind-the-scenes write-ups and system design breakdowns.</p>
]]></content:encoded></item><item><title><![CDATA[Run Docker Remotely with Portainer Instead of Docker Desktop]]></title><description><![CDATA[Managing Docker remotely, especially on a Linux server, is a powerful alternative to using Docker Desktop on your local machine. In this guide, we’ll walk through how to:

Host and manage Docker containers remotely.

Use Portainer as a graphical inte...]]></description><link>https://tech.cyrilsebastian.com/run-docker-remotely-with-portainer-instead-of-docker-desktop</link><guid isPermaLink="true">https://tech.cyrilsebastian.com/run-docker-remotely-with-portainer-instead-of-docker-desktop</guid><category><![CDATA[Docker]]></category><category><![CDATA[Portainer]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Cloud Computing]]></category><category><![CDATA[docker images]]></category><category><![CDATA[Productivity]]></category><dc:creator><![CDATA[Cyril Sebastian]]></dc:creator><pubDate>Mon, 28 Jul 2025 07:49:47 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1753688401474/44eee0fc-bfd3-4c9f-963a-641f7f4d7102.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Managing Docker remotely, especially on a Linux server, is a powerful alternative to using Docker Desktop on your local machine. In this guide, we’ll walk through how to:</p>
<ul>
<li><p>Host and manage Docker containers remotely.</p>
</li>
<li><p>Use <strong>Portainer</strong> as a graphical interface to control Docker.</p>
</li>
<li><p>Replace Docker Desktop with a more flexible and lightweight remote workflow.</p>
</li>
</ul>
<p>Whether you're a DevOps engineer, a cloud enthusiast, or simply exploring alternatives to Docker Desktop, this setup will enhance your productivity.</p>
<hr />
<h2 id="heading-why-portainer-instead-of-docker-desktop">🧱 Why Portainer Instead of Docker Desktop?</h2>
<p>Docker Desktop is great, but it has limitations:</p>
<ul>
<li><p>Requires a GUI and lots of system resources.</p>
</li>
<li><p>Licensing costs for teams.</p>
</li>
<li><p>Tied to your local machine.</p>
</li>
</ul>
<p><strong>Portainer</strong> is:</p>
<ul>
<li><p>Lightweight</p>
</li>
<li><p>Web-based</p>
</li>
<li><p>Platform-agnostic</p>
</li>
<li><p>Perfect for headless servers and remote access.</p>
</li>
</ul>
<hr />
<h2 id="heading-step-by-step-guide">🗂️ Step-by-Step Guide</h2>
<h3 id="heading-1-connect-to-your-server">1. Connect to Your Server</h3>
<p>SSH into your remote server:</p>
<pre><code class="lang-bash">ssh your-user@your-server-ip
</code></pre>
<p>Switch to root if necessary:</p>
<pre><code class="lang-bash">sudo -i
</code></pre>
<hr />
<h3 id="heading-2-create-a-directory-for-docker-configs">2. Create a Directory for Docker Configs</h3>
<p>We’ll organize Portainer and any other services under <code>/opt/docker</code>:</p>
<pre><code class="lang-bash">mkdir -p /opt/docker &amp;&amp; <span class="hljs-built_in">cd</span> /opt/docker
</code></pre>
<hr />
<h3 id="heading-3-prepare-portaineryml-file">3. Prepare <code>portainer.yml</code> File</h3>
<p>Create a Docker Compose file for Portainer:</p>
<pre><code class="lang-bash">nano portainer.yml
</code></pre>
<p>Paste the following content:</p>
<pre><code class="lang-yaml"><span class="hljs-attr">version:</span> <span class="hljs-string">'3.8'</span>

<span class="hljs-attr">services:</span>
  <span class="hljs-attr">portainer:</span>
    <span class="hljs-attr">image:</span> <span class="hljs-string">portainer/portainer-ce:latest</span>
    <span class="hljs-attr">container_name:</span> <span class="hljs-string">portainer</span>
    <span class="hljs-attr">restart:</span> <span class="hljs-string">unless-stopped</span>
    <span class="hljs-attr">ports:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">"9000:9000"</span>
    <span class="hljs-attr">volumes:</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">/var/run/docker.sock:/var/run/docker.sock</span>
      <span class="hljs-bullet">-</span> <span class="hljs-string">portainer_data:/data</span>

<span class="hljs-attr">volumes:</span>
  <span class="hljs-attr">portainer_data:</span>
</code></pre>
<p>Save and exit (<code>Ctrl + O</code>, <code>Enter</code>, <code>Ctrl + X</code>).</p>
<hr />
<h3 id="heading-4-start-portainer">4. Start Portainer</h3>
<p>Now launch Portainer with:</p>
<pre><code class="lang-bash">docker compose -f portainer.yml up -d
</code></pre>
<p>You should see:</p>
<pre><code class="lang-bash">✔ Container portainer  Started
</code></pre>
<p>To follow logs:</p>
<pre><code class="lang-bash">docker compose -f portainer.yml logs -f
</code></pre>
<hr />
<h3 id="heading-5-access-portainer-in-a-browser">5. Access Portainer in a Browser</h3>
<p>Open your browser and go to:</p>
<pre><code class="lang-plaintext">http://&lt;your-server-ip&gt;:9000
</code></pre>
<p>You’ll be prompted to set up an admin password and connect to your local Docker environment (it’ll detect the Docker socket you mapped).</p>
<hr />
<h2 id="heading-confirm-its-working">🔍 Confirm It’s Working</h2>
<p>To check if Portainer is running:</p>
<pre><code class="lang-bash">docker ps
</code></pre>
<p>Expected output:</p>
<pre><code class="lang-plaintext">CONTAINER ID   IMAGE                          PORTS                    NAMES
xxxxxxx        portainer/portainer-ce         0.0.0.0:9000-&gt;9000/tcp   portainer
</code></pre>
<p>You can also view logs anytime:</p>
<pre><code class="lang-bash">docker logs portainer
</code></pre>
<hr />
<h3 id="heading-run-docker-remotely-like-its-local">⚙️ Run Docker Remotely Like It’s Local</h3>
<p>You can run any Docker CLI commands remotely <strong>without touching your lab machine</strong> directly.</p>
<h4 id="heading-option-1-set-a-permanent-remote-docker-context">Option 1: Set a Permanent Remote Docker Context</h4>
<p>Edit your <code>.zshrc</code> file on your Mac:</p>
<pre><code class="lang-plaintext">echo 'export DOCKER_HOST=ssh://username@remote_ip' &gt;&gt; ~/.zshrc source ~/.zshrc
</code></pre>
<p>Now, you can run <strong>any</strong> Docker CLI command like:</p>
<pre><code class="lang-plaintext">docker ps 
docker compose up -d 
docker logs &lt;container&gt;
</code></pre>
<p>…and it’ll execute on the remote machine automatically. No <code>ssh</code>, no fuss.</p>
<h2 id="heading-conclusion">✅ Conclusion</h2>
<p>You now have a clean and efficient way to manage Docker <strong>remotely</strong> without relying on Docker Desktop. Using <strong>Portainer</strong>, you gain:</p>
<ul>
<li><p>A lightweight web UI</p>
</li>
<li><p>Easy container and image management</p>
</li>
<li><p>Portability across systems</p>
</li>
<li><p>Remote team-friendly access</p>
</li>
</ul>
<p>This approach is ideal for remote development, headless servers, or teams building CI/CD and DevOps pipelines.</p>
]]></content:encoded></item><item><title><![CDATA[Self-Hosted Streaming with Jellyfin and ngrok – A Personal Weekend Project]]></title><description><![CDATA[It all started with a simple need — I wanted to watch some old movies and personal family videos stored on my desktop. My niece had also backed up a bunch of vacation clips from her phone onto my machine, and now that she’s back home, I needed an eas...]]></description><link>https://tech.cyrilsebastian.com/self-hosted-streaming-with-jellyfin-and-ngrok-a-personal-weekend-project</link><guid isPermaLink="true">https://tech.cyrilsebastian.com/self-hosted-streaming-with-jellyfin-and-ngrok-a-personal-weekend-project</guid><category><![CDATA[Devops]]></category><category><![CDATA[self-hosted]]></category><category><![CDATA[Docker]]></category><category><![CDATA[jellyfin]]></category><category><![CDATA[ngrok]]></category><category><![CDATA[Homelab]]></category><dc:creator><![CDATA[Cyril Sebastian]]></dc:creator><pubDate>Tue, 08 Jul 2025 07:13:07 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1751957425923/faac035a-83eb-4e01-8111-5eb7963a57bc.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>It all started with a simple need — I wanted to watch some old movies and personal family videos stored on my desktop. My niece had also backed up a bunch of vacation clips from her phone onto my machine, and now that she’s back home, I needed an easy way to send them back to her without juggling USB drives or cloud uploads.</p>
<p>Being someone from a DevOps/SRE background, these small personal needs often spiral into fun infrastructure experiments. So, I figured, why not use this as a chance to self-host a media server?</p>
<p>That’s when I turned to <a target="_blank" href="https://jellyfin.org/">Jellyfin</a> — a free, open-source media streaming solution. Paired with <code>ngrok</code> securely sharing access with family, I now had my very own private “Netflix” running directly from my Linux desktop. This was one of those side projects that started as a practical fix but turned into an unexpectedly enjoyable weekend build.</p>
<p>In this post, I’ll walk you through exactly how I set it all up using Docker and ngrok — clean, simple, and DevOps-style.</p>
<h3 id="heading-why-jellyfin">📌 Why Jellyfin?</h3>
<p>As someone who values open-source tools and self-hosted alternatives, Jellyfin hits the sweet spot:</p>
<ul>
<li><p>Fully open-source</p>
</li>
<li><p>No telemetry or licensing headaches</p>
</li>
<li><p>Supports local media, subtitles, transcoding, and even user profiles</p>
</li>
</ul>
<p>I had a folder full of personal and family videos collecting digital dust. Jellyfin gave them a Netflix-like interface without the surveillance.</p>
<hr />
<h2 id="heading-step-1-run-jellyfin-in-docker">🐳 Step 1: Run Jellyfin in Docker</h2>
<p>To keep things clean and reproducible (DevOps mantra!), I went with Docker.</p>
<pre><code class="lang-bash">docker run -d \
  --name jellyfin \
  -p 8096:8096 \
  -v /home/user/Documents/p2p:/media \
  jellyfin/jellyfin
</code></pre>
<p>A few things to note:</p>
<ul>
<li><p><code>-d</code> runs it in the background</p>
</li>
<li><p>Port 8096 is Jellyfin’s default web UI</p>
</li>
<li><p>The <code>-v</code> mount points my local media directory into the container at <code>/media</code></p>
</li>
</ul>
<p>Once the container spins up, hit <a target="_blank" href="http://localhost:8096"><code>http://localhost:8096</code></a> on your browser and follow the setup wizard. You’ll be able to:</p>
<ul>
<li><p>Create an admin account</p>
</li>
<li><p>Add your media libraries</p>
</li>
<li><p>Configure transcoding and user access</p>
</li>
</ul>
<p>Simple and smooth.</p>
<hr />
<h2 id="heading-step-2-access-jellyfin-from-anywhere-using-ngrok">🌍 Step 2: Access Jellyfin from Anywhere Using <code>ngrok</code></h2>
<p>Since I didn’t want to mess with router port forwarding or dynamic DNS at home (and certainly not expose ports to the internet unsafely), <code>ngrok</code> was the perfect plug-and-play solution.</p>
<h3 id="heading-install-ngrok">Install ngrok</h3>
<pre><code class="lang-bash">wget https://bin.equinox.io/c/bNyj1mQVY4c/ngrok-v3-stable-linux-amd64.tgz
tar -xvf ngrok-v3-stable-linux-amd64.tgz
sudo mv ngrok /usr/<span class="hljs-built_in">local</span>/bin
ngrok version
</code></pre>
<p>You’ll need an <a target="_blank" href="https://ngrok.com/">ngrok account</a> to get an auth token. Then:</p>
<pre><code class="lang-bash">ngrok config add-authtoken &lt;YOUR_AUTH_TOKEN&gt;
</code></pre>
<h3 id="heading-create-a-tunnel-for-port-8096">Create a Tunnel for Port 8096</h3>
<pre><code class="lang-bash">ngrok http 8096
</code></pre>
<p>Boom! You’ll get a public HTTPS URL <a target="_blank" href="https://abc123.ngrok.io"><code>https://abc123.ngrok.io</code></a> that tunnels securely to your local Jellyfin instance.</p>
<h3 id="heading-bonus-protect-it-with-basic-auth">Bonus: Protect it with Basic Auth</h3>
<p>To prevent unauthorized access, use basic auth:</p>
<pre><code class="lang-bash">ngrok http -auth=<span class="hljs-string">"username:password"</span> 8096
</code></pre>
<p>Replace with your credentials. Now even if someone stumbles upon the link, they’ll need to authenticate first.</p>
<hr />
<h2 id="heading-alternate-run-ngrok-in-docker">📦 Alternate: Run ngrok in Docker</h2>
<p>If you like consistency (like me), you might prefer running <code>ngrok</code> in a container too:</p>
<pre><code class="lang-bash">docker run -it \
  -e NGROK_AUTHTOKEN=&lt;YOUR_AUTH_TOKEN&gt; \
  ngrok/ngrok http 8096
</code></pre>
<p>Optional: include <code>-auth="username:password"</code> in the command above if you want the same security via Docker.</p>
<hr />
<h2 id="heading-a-quick-word-on-security">🔒 A Quick Word on Security</h2>
<p>This setup is intended for personal use. If you're planning to stream across multiple users or set up a family server, consider:</p>
<ul>
<li><p>Running Jellyfin behind a proper reverse proxy (like Nginx)</p>
</li>
<li><p>Using a free domain with Let's Encrypt certs</p>
</li>
<li><p>Disabling public tunnels when not in use</p>
</li>
<li><p>Not exposing write access to the mounted media directory (read-only is safer)</p>
</li>
</ul>
<hr />
<h2 id="heading-real-world-use-cases">🎯 Real-World Use Cases</h2>
<ol>
<li><p><strong>Personal Netflix Clone</strong> – Watch your ripped DVDs or archived home videos from anywhere.</p>
</li>
<li><p><strong>Test Media Playback Over Slow Networks</strong> – Useful if you’re tuning transcoding profiles for a home server.</p>
</li>
<li><p><strong>Portable Demos</strong> – Great for showing media apps at meetups or events without deploying to a cloud server.</p>
</li>
<li><p><strong>Media Backup Viewer</strong> – Remote preview of a NAS or cold storage drive contents.</p>
</li>
</ol>
<hr />
<h2 id="heading-final-thoughts">🧠 Final Thoughts</h2>
<p>For anyone in DevOps, self-hosting isn't just about saving money. It's about owning your stack, learning through tinkering, and reusing familiar tools (like Docker, tunneling, logs) in a low-stress, real-life scenario.</p>
<p>This Jellyfin + ngrok combo was less than an hour-long weekend project, but the satisfaction it gives — seeing your own media beautifully indexed and remotely accessible — is real.</p>
<p>Give it a try. This might just become your favorite side gig for relaxing after a long sprint.</p>
<hr />
<p>If you found this helpful or tried something similar, I’d love to hear about your setup, tweaks, or war stories. Leave a comment on <a target="_blank" href="https://hashnode.com/@cyrilsebastian">Hashnode</a> or connect via <a target="_blank" href="https://linkedin.com/in/yourprofile">LinkedIn</a>, I’m always happy to connect and geek out about self-hosting and home lab fun.</p>
<p>Happy streaming! 🎥🍿</p>
<hr />
<p>ImageCredits: Photo by <a target="_blank" href="https://unsplash.com/@popcornmatch?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Marques Kaspbrak</a> on <a target="_blank" href="https://unsplash.com/photos/black-flat-screen-tv-turned-on-on-brown-wooden-tv-rack-n1amn-SHKzw?utm_content=creditCopyText&amp;utm_medium=referral&amp;utm_source=unsplash">Unsplash</a></p>
]]></content:encoded></item><item><title><![CDATA[Planning a Cloud Migration? 10 Lessons from Production Cutovers]]></title><description><![CDATA[Moving a production-grade application serving millions of daily users from one cloud provider to another is a high-stakes operation. After executing a complete GCP to AWS migration—including 21TB of data, MongoDB replica sets, MySQL clusters, and Apa...]]></description><link>https://tech.cyrilsebastian.com/planning-a-cloud-migration-10-lessons-from-production-cutovers</link><guid isPermaLink="true">https://tech.cyrilsebastian.com/planning-a-cloud-migration-10-lessons-from-production-cutovers</guid><category><![CDATA[AWS]]></category><category><![CDATA[GCP]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[learning]]></category><dc:creator><![CDATA[Cyril Sebastian]]></dc:creator><pubDate>Fri, 27 Jun 2025 05:22:13 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1750771256911/d7245ccc-05f3-47ce-a494-50d6ade0baa4.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Moving a production-grade application serving millions of daily users from one cloud provider to another is a high-stakes operation. After executing a complete <a target="_blank" href="https://tech.cyrilsebastian.com/gcp-to-aws-migration-part-2-real-cutover-issues-and-recovery">GCP to AWS migration</a>—including 21TB of data, MongoDB replica sets, MySQL clusters, and Apache Solr search infrastructure—here are ten critical lessons that separate successful migrations from costly disasters. Connect with me on <a target="_blank" href="https://www.linkedin.com/in/sebastiancyril/">LinkedIn for more</a> real-world DevOps insights.</p>
<h2 id="heading-1-hidden-dependencies-surface-during-cutover-weekend">1. Hidden Dependencies Surface During Cutover Weekend</h2>
<p>Even though we had tools for the job, coordinating syncs across staging, pre-prod, and production environments required careful orchestration and monitoring. Your application architecture diagram rarely tells the complete story. During our migration, we discovered hardcoded GCP configurations buried deep in environment variables and application configs that weren't documented anywhere. Network monitoring weeks before cutover revealed missed communication patterns, including internal DNS dependencies that required AWS Route 53 private hosted zones to replicate GCP's automatic internal DNS resolution.</p>
<h2 id="heading-2-data-transfer-costs-are-a-hidden-budget-killer">2. Data Transfer Costs Are a Hidden Budget Killer</h2>
<p>Data Transfer Out from GCP was a hidden but substantial cost center; plan for this when budgeting. Our 21TB migration taught us that egress charges can dramatically exceed your initial estimates. We saved thousands of dollars by identifying GCS buckets containing temporary compliance logs with auto-expiry policies and choosing to let them expire in GCP rather than migrating unnecessary data. Plan for these costs early and audit your data to ensure migration is necessary.</p>
<h2 id="heading-3-database-replication-lag-is-your-biggest-enemy">3. Database Replication Lag Is Your Biggest Enemy</h2>
<p>One of the major blockers encountered was replication lag during promotion drills, especially under active write-heavy workloads. Our MySQL master-slave setup experienced significant lag during peak traffic. The solution was implementing iptables-based rules at the OS level to block application write traffic, allowing replication to catch up safely before cutover. This gave us a clean buffer for promotion without the risk of in-flight transactions.</p>
<h2 id="heading-4-storage-behavior-differs-dramatically-between-clouds">4. Storage Behavior Differs Dramatically Between Clouds</h2>
<p>Lazy loading of EBS volumes made autoscaling unreliable for time-sensitive indexing. Our Apache Solr migration revealed that AWS EBS volumes suffer from lazy loading, meaning data wasn't instantly accessible upon instance boot-up. In GCP, persistent volumes are mounted seamlessly with boot disks, enabling autoscaling. In AWS, we had to abandon autoscaling for Solr and use scheduled start/stop scripts instead. Factor in rebuild times and whether AWS Fast Snapshot Restore fits your budget.</p>
<p>The IOPS and throughput planning complexity were another major difference. GCP used to take care of IOPS and throughput by choosing SSD disks, and increasing the disk size if more IOPS and throughput are required. But in AWS, we had to plan accordingly as per the usage, especially for databases. EBS gp3 volumes require explicit IOPS and throughput provisioning separate from storage capacity, meaning our database performance tuning became a multi-dimensional optimization problem rather than GCP's simpler disk-size-based scaling.</p>
<h2 id="heading-5-load-balancer-architectures-require-complete-rethinking">5. Load Balancer Architectures Require Complete Rethinking</h2>
<p>The differences between GCP's global HTTPS Load Balancer and AWS Application Load Balancer go beyond simple configuration. GCP's URL maps allowed expressive, path-based routing across services. AWS required translating these into listener rules and target groups, often resulting in more granular configurations. We moved from static public IPs to CNAME-based routing, requiring DNS strategy adjustments and SSL certificate management changes through AWS Certificate Manager.</p>
<h2 id="heading-6-security-models-force-architectural-changes">6. Security Models Force Architectural Changes</h2>
<p>All EC2 instances (except load balancers) were placed in private subnets for enhanced security. We had to implement bastion host access, update CORS headers for CloudFront integration, and create explicit firewall rules using iptables to control MySQL access during migration. AWS's security group model required translating GCP firewall rules while adding WAF integration for DDoS protection.</p>
<h2 id="heading-7-network-restrictions-create-unexpected-blockers">7. Network Restrictions Create Unexpected Blockers</h2>
<p>AWS restricts outbound SMTP traffic on port 25 by default to prevent abuse. This is not the case in GCP, so ensure to factor this into your cutover timeline if you're migrating mail servers. Our Postfix mail servers required explicit AWS Support requests to open port 25, adding weeks to our timeline.</p>
<p>Beyond port restrictions, we discovered that our maximum utilized servers with low configurations couldn't be placed on burstable VM instances like t3 due to CPU credit limitations and network throttling. High-traffic applications that ran smoothly on GCP's custom CPU/RAM configurations suffered performance degradation when mapped to AWS burstable instances. We had to carefully analyze baseline vs. burst performance patterns and move critical workloads to dedicated instance types like m6i or c6i to avoid throttling during peak loads.</p>
<h2 id="heading-8-rollback-plans-must-account-for-cloud-specific-behaviors">8. Rollback Plans Must Account for Cloud-Specific Behaviors</h2>
<p>We implemented iptables-based rules at the OS level to block application write traffic, allowing replication to catch up safely before cutover. Our rollback strategy included controlled write freezes and hybrid MongoDB nodes that acted as bridges between cloud environments. The key was testing promotion and demotion scenarios multiple times, not just hoping data backups would suffice.</p>
<h2 id="heading-9-monitoring-blind-spots-emerge-in-cloud-transitions">9. Monitoring Blind Spots Emerge in Cloud Transitions</h2>
<p>We experienced monitoring gaps during the most critical phases when existing tools didn't translate to the new environment. Setting up CloudWatch, maintaining Nagios compatibility, and ensuring Grafana dashboards worked across both environments simultaneously was crucial. Establish baseline performance metrics in both clouds and create real-time visibility before cutover day.</p>
<h2 id="heading-10-post-migration-optimization-is-where-real-value-emerges">10. Post-Migration Optimization Is Where Real Value Emerges</h2>
<p>This migration tested our nerves and processes, but ultimately, it left us with better observability, tighter security, and an infrastructure we could proudly call production-grade. After successful cutover, we rightsized EC2 instances using historical metrics, implemented Savings Plans for reserved workloads, and enabled S3 lifecycle policies. The migration wasn't just about changing providers—it forced us to modernize our entire infrastructure approach.</p>
<h2 id="heading-the-reality-of-production-cutovers">The Reality of Production Cutovers</h2>
<p><a target="_blank" href="https://tech.cyrilsebastian.com/gcp-to-aws-migration-part-1-architecture-data-transfer-and-infrastructure-setup">No plan survives contact without flexibility</a>. Our experience proved that even with meticulous planning, live production environments will surprise you. We faced MySQL promotion lags, discovered hardcoded configurations requiring emergency patches, and dealt with Solr performance issues under load. The key was having a flexible team ready to improvise while maintaining strict rollback readiness.</p>
<p>The most important lesson? Migration is more than lift-and-shift—it's evolve or expire. Successful cloud migration requires embracing architectural differences rather than fighting them. Plan thoroughly, test relentlessly, and be prepared to adapt quickly when reality differs from your runbook.</p>
<p>Your cloud migration is an opportunity to build better infrastructure, not just move existing problems to a new provider. Approach it as a chance to evolve your entire operational model, and you'll emerge stronger on the other side.</p>
]]></content:encoded></item><item><title><![CDATA[GCP to AWS Migration – Part 2: Real Cutover, Issues & Recovery]]></title><description><![CDATA[🚀 Start of Part 2: The Real Cutover & Beyond
While Part 1 laid the architectural and data groundwork, Part 2 is where the real-world complexity kicked in.
We faced:

Database promotions that didn’t go as rehearsed,

Lazy-loaded Solr indexes fighting...]]></description><link>https://tech.cyrilsebastian.com/gcp-to-aws-migration-part-2-real-cutover-issues-and-recovery</link><guid isPermaLink="true">https://tech.cyrilsebastian.com/gcp-to-aws-migration-part-2-real-cutover-issues-and-recovery</guid><category><![CDATA[AWS]]></category><category><![CDATA[GCP]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Cloud]]></category><dc:creator><![CDATA[Cyril Sebastian]]></dc:creator><pubDate>Wed, 04 Jun 2025 19:22:18 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1749064506569/6c570f45-e003-4ab8-ae02-19374239cfe8.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<h2 id="heading-start-of-part-2-the-real-cutover-amp-beyond">🚀 <strong>Start of Part 2: The Real Cutover &amp; Beyond</strong></h2>
<p><a target="_blank" href="https://tech.cyrilsebastian.com/gcp-to-aws-migration-part-1-architecture-data-transfer-and-infrastructure-setup"><strong>While Part 1 laid the architectural and data groundwork</strong></a>, Part 2 is where the real-world complexity kicked in.</p>
<p>We faced:</p>
<ul>
<li><p>Database promotions that didn’t go as rehearsed,</p>
</li>
<li><p>Lazy-loaded Solr indexes fighting with EBS latency,</p>
</li>
<li><p>Hardcoded GCP configs in the dark corners of our stack,</p>
</li>
<li><p>And the high-stakes pressure of a real-time production cutover.</p>
</li>
</ul>
<p>If Part 1 was planning and theory, <strong>Part 2 was execution and improvisation</strong>.</p>
<p>Let’s dive into the live switch, the challenges we didn’t see coming, and how we turned them into lessons and long-term wins.</p>
<hr />
<h2 id="heading-phase-4-application-amp-infrastructure-layer-adaptation">⚙️ Phase 4: Application &amp; Infrastructure Layer Adaptation</h2>
<p>As part of the migration, significant adjustments were required in both the application configuration and infrastructure setup to align with AWS's architecture and security practices.</p>
<h3 id="heading-key-changes-amp-adaptations">Key Changes &amp; Adaptations</h3>
<ul>
<li><p><strong>Private Networking &amp; Bastion Access</strong></p>
<ul>
<li><p>All EC2 instances (except load balancers) were placed in <strong>private subnets</strong> for enhanced security.</p>
</li>
<li><p>Initial access was via a <strong>VPN client → AWS Direct Connect → Bastion Host</strong> setup.</p>
</li>
<li><p>Post-migration, the <strong>bastion host's public IP was decommissioned</strong>, relying solely on secure, internal access.</p>
</li>
</ul>
</li>
<li><p><strong>CORS &amp; S3 Policy Updates</strong></p>
<ul>
<li><p>Applications required updates to <strong>CORS headers</strong> to handle static content requests from a different domain (CloudFront).</p>
</li>
<li><p><strong>S3 bucket policies</strong> were reconfigured to allow read access only via CloudFront, blocking direct public access.</p>
</li>
</ul>
</li>
<li><p><strong>Application Configuration Updates</strong></p>
<ul>
<li>All environment-specific settings, including <code>.env</code> variables, were audited and updated to replace <strong>hardcoded GCP endpoints</strong> with dynamic, AWS-native configurations (e.g., RDS endpoints, S3 URLs).</li>
</ul>
</li>
<li><p><strong>Internal DNS Transition</strong></p>
<ul>
<li><p>In GCP, internal DNS resolution is automatically managed.</p>
</li>
<li><p>AWS replicated this behavior using <strong>Route 53 private hosted zones</strong>, ensuring seamless service discovery across private subnets.</p>
</li>
</ul>
</li>
<li><p><strong>Static Asset Delivery via CloudFront</strong></p>
<ul>
<li>All requests to static assets in S3 were redirected through <strong>Amazon CloudFront</strong>, improving performance and reducing latency for global users.</li>
</ul>
</li>
<li><p><strong>Security Hardening with WAF</strong></p>
<ul>
<li><p>Integrated <strong>AWS Web Application Firewall (WAF)</strong> in front of the CloudFront distribution.</p>
</li>
<li><p>Applied enterprise-grade rules:</p>
<ul>
<li><p><strong>Rate limiting</strong> to prevent abuse</p>
</li>
<li><p><strong>Geo-blocking and IP filtering</strong> based on security policies</p>
</li>
<li><p><strong>DDoS protection</strong> via AWS Shield integration</p>
</li>
</ul>
</li>
</ul>
</li>
</ul>
<h3 id="heading-firewall-rules-securing-aws-mysql-from-legacy-sources">Firewall Rules: Securing AWS MySQL from Legacy Sources</h3>
<p>To ensure <strong>controlled access to the new MySQL server in AWS</strong>, we hardened the instance using explicit <code>iptables</code> rules. These rules:</p>
<ul>
<li><p><strong>Blocked direct MySQL access</strong> from legacy or untrusted subnets (e.g., GCP App subnets)</p>
</li>
<li><p><strong>Allowed SSH access</strong> only from trusted bastion/admin IPs during the migration window</p>
</li>
</ul>
<pre><code class="lang-bash">FIREWALL RULES FLOW:

[Blocked Sources] ──❌──┐
                        │
10.AAA.0.0/22     ──────┤
10.BBB.248.0/21   ──────┤
                        ├─── DROP:3306 ───┐
                        │                 │
[Allowed Sources] ──✅──┤                 ▼
                        │         ┌─────────────────┐
10.AAA.0.4/32     ──────┤         │ 10.BBB.CCC.223  │
10.BBB.248.158/32 ──────┤         │  MySQL Server   │
10.BBB.251.107/32 ──────┼─ACCEPT──│     (AWS)       │
10.BBB.253.9/32   ──────┤  :22    │                 │
                        │         └─────────────────┘
</code></pre>
<p><strong>Legend:</strong></p>
<ul>
<li><p><a target="_blank" href="http://10.AAA"><code>10.AAA</code></a><code>.x.x</code> = Source network (GCP)</p>
</li>
<li><p><code>10.BBB.CCC.223</code> = Target MySQL server in AWS</p>
</li>
<li><p>IPs like <code>10.BBB.248.158</code> = Bastion or trusted admin IPs allowed for SSH</p>
</li>
</ul>
<blockquote>
<p>This rule-based approach gave us an <strong>extra layer of protection</strong> beyond AWS Security Groups during the critical migration phase.</p>
</blockquote>
<h2 id="heading-load-balancer-differences-gcp-vs-aws">🌐 Load Balancer Differences: GCP vs AWS</h2>
<p>During the migration, we encountered significant differences in how load balancing is handled between GCP and AWS. This required architectural adjustments and deeper routing, SSL, and compute scaling planning.</p>
<h3 id="heading-comparison-overview">📊 Comparison Overview</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Feature</td><td><strong>GCP HTTPS Load Balancer</strong></td><td><strong>AWS Application Load Balancer (ALB)</strong></td></tr>
</thead>
<tbody>
<tr>
<td><strong>Scope</strong></td><td>Global by default</td><td>Regional</td></tr>
<tr>
<td><strong>TLS/SSL</strong></td><td>Wildcard SSL was uploaded to GCP</td><td>Managed manually via <strong>AWS Certificate Manager (ACM)</strong></td></tr>
<tr>
<td><strong>Routing Logic</strong></td><td>URL Maps</td><td>Target Groups with Listener Rules</td></tr>
<tr>
<td><strong>IP Type</strong></td><td>Static Public IP</td><td>CNAME with DNS-based routing</td></tr>
<tr>
<td><strong>Backend Integration</strong></td><td>Global Load Balancer → MIG (Managed Instance Groups)</td><td>ALB → Target Group → ASG (Auto Scaling Group)</td></tr>
</tbody>
</table>
</div><h3 id="heading-key-migration-notes">🧩 Key Migration Notes</h3>
<ul>
<li><p><strong>Static IP vs DNS Routing</strong></p>
<ul>
<li><p>In GCP, the HTTPS Load Balancer was fronted with a <strong>static public IP</strong>, offering low-latency global access.</p>
</li>
<li><p>In AWS, ALB uses <strong>CNAME-based routing</strong>, meaning clients resolve the ALB DNS name (e.g., <a target="_blank" href="http://abc-region.elb.amazonaws.com"><code>abc-region.elb.amazonaws.com</code></a>) via Route 53 or third-party DNS.</p>
</li>
</ul>
</li>
<li><p><strong>Routing Mechanism Differences</strong></p>
<ul>
<li><p>GCP’s <strong>URL maps</strong> allowed expressive, path-based routing across services.</p>
</li>
<li><p>AWS required translating these into <strong>listener rules</strong> and <strong>target groups</strong>, often resulting in more granular configurations.</p>
</li>
</ul>
</li>
<li><p><strong>SSL/TLS Certificates</strong></p>
<ul>
<li><p>GCP handled our custom wildcard SSL certificate.</p>
</li>
<li><p>In AWS, we migrated to <strong>ACM (AWS Certificate Manager)</strong> for easier management of domain validations, renewals, and usage across ALBs and CloudFront.</p>
</li>
</ul>
</li>
<li><p><strong>Application-Specific Custom Rules</strong></p>
<ul>
<li><p>In AWS, we created <strong>custom listener rules</strong> to forward traffic based on request path or headers, similar to GCP’s URL maps.</p>
</li>
<li><p>These rules were especially useful for routing requests to internal APIs and static content.</p>
</li>
</ul>
</li>
</ul>
<h3 id="heading-special-case-postfix-amp-port-25-restrictions">📮 Special Case: Postfix &amp; Port 25 Restrictions</h3>
<p>To migrate our <strong>Postfix mail servers</strong> that use port 25 for SMTP, we had to:</p>
<ul>
<li><p>Submit an <strong>explicit request to AWS Support</strong> for <strong>port 25 to be opened</strong> (outbound) on our AWS account in the specific region.</p>
</li>
<li><p>This was a prerequisite for creating a <strong>Network Load Balancer (NLB)</strong> that could pass traffic directly to the Postfix instances.</p>
</li>
</ul>
<p><strong>Note</strong>: AWS restricts outbound SMTP traffic on port 25 by default to prevent abuse. This is not the case in GCP, so ensure to factor this into your cutover timeline if you're migrating mail servers.</p>
<hr />
<h2 id="heading-phase-5-apache-solr-migration">🔍 Phase 5: Apache Solr Migration</h2>
<p>Apache Solr powered our platform's search functionality with complex indexing and fast response times. Migrating it to AWS introduced both architectural and operational complexities.</p>
<h3 id="heading-migration-strategy">🛠️ Migration Strategy</h3>
<ul>
<li><p><strong>AMI Creation Was Non-Trivial</strong>:<br />  We created custom AMIs for Solr nodes with large EBS volumes. However, this surfaced two key challenges:</p>
<ul>
<li><p><strong>Large volume AMI creation took longer than expected</strong></p>
</li>
<li><p><strong>Lazy loading of attached volumes</strong> in AWS meant the data wasn’t instantly accessible upon instance boot-up.</p>
</li>
</ul>
</li>
<li><p><strong>No AWS FSR</strong>:<br />  AWS <strong>Fast Snapshot Restore (FSR)</strong> could have helped—but was ruled out due to budget constraints. Without FSR, we observed delayed volume readiness post-launch.</p>
</li>
<li><p><strong>Index Rebuild from Source DB</strong>:<br />  Post-migration, we <strong>rebuilt Solr indexes</strong> from source data stored in MongoDB and MySQL, ensuring consistency and avoiding partial data issues.</p>
</li>
<li><p><strong>Master-Slave Architecture</strong>:<br />  We finalized a <strong>standalone Solr master-slave setup</strong> on EC2 after a dedicated PoC. This provided better control compared to GCP's managed instance groups.</p>
</li>
</ul>
<hr />
<h3 id="heading-gcp-vs-aws-deployment-model">🏗️ GCP vs AWS Deployment Model</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Feature</td><td>GCP MIGs</td><td>AWS EC2 Standalone</td></tr>
</thead>
<tbody>
<tr>
<td><strong>Deployment</strong></td><td>Solr slaves ran in <strong>Managed Instance Groups</strong></td><td>Solr nodes deployed on <strong>standalone EC2s</strong></td></tr>
<tr>
<td><strong>Volume Attachment</strong></td><td>Persistent volumes mounted with boot disk</td><td>EBS volumes suffered from <strong>lazy loading</strong>, slowing boot</td></tr>
<tr>
<td><strong>Autoscaling</strong></td><td>Fully autoscaled Solr slaves based on demand</td><td>Autoscaling impractical due to volume readiness delays</td></tr>
<tr>
<td><strong>Cost Management</strong></td><td>On-demand scaling saved costs</td><td>Used <strong>EC2 scheduling</strong> (shutdown/startup) to control spend</td></tr>
</tbody>
</table>
</div><h3 id="heading-operational-decision-no-autoscaling-for-solr-in-aws">⚡ Operational Decision: No Autoscaling for Solr in AWS</h3>
<p>In GCP, autoscaling Solr slaves was seamless—new instances booted with attached volumes and joined the cluster dynamically.</p>
<p>However, in AWS:</p>
<ul>
<li><p>Lazy loading of EBS volumes made autoscaling <strong>unreliable for time-sensitive indexing</strong>.</p>
</li>
<li><p>Instead, we:</p>
<ul>
<li><p><strong>Kept EC2 nodes in a fixed topology</strong></p>
</li>
<li><p>Used <strong>scheduled start/stop scripts</strong> (via cron) to manage uptime during peak/off-peak hours.</p>
</li>
</ul>
</li>
</ul>
<hr />
<h3 id="heading-lessons-learned">Lessons Learned</h3>
<p>Solr migrations need deep consideration of disk behavior in AWS. If you're not using FSR, <strong>do not assume volume availability equals data availability</strong>. Factor in rebuild times, cost impact, and whether autoscaling truly benefits your workload.</p>
<hr />
<h2 id="heading-the-cutover-weekend">🛑 The Cutover Weekend</h2>
<p>We declared a <strong>deployment freeze 7 days before</strong> the migration to maintain stability and reduce last-minute surprises.</p>
<h3 id="heading-pre-cutover-checklist">Pre-Cutover Checklist</h3>
<ul>
<li><p>TTL reduced to <strong>60 seconds</strong> to allow quick DNS propagation.</p>
</li>
<li><p>Final <strong>S3 and database sync</strong> performed.</p>
</li>
<li><p><strong>Checksums validated</strong> for critical data.</p>
</li>
<li><p><strong>Route 53</strong> routing policies configured to mimic GCP’s internal DNS.</p>
</li>
<li><p><strong>CloudWatch</strong>, Nagios, and Grafana set up for monitoring.</p>
</li>
<li><p>Final <strong>fallback snapshot</strong> captured.</p>
</li>
<li><p>A comprehensive <strong>cutover runbook</strong> was prepared with clear task ownership and escalation paths.</p>
</li>
</ul>
<hr />
<h3 id="heading-cutover-timeline">🕒 Cutover Timeline</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Time Slot</td><td>Task</td></tr>
</thead>
<tbody>
<tr>
<td>Hour 1</td><td>Final S3 + DB sync</td></tr>
<tr>
<td>Hour 2–3</td><td>DB failover and validation</td></tr>
<tr>
<td>Hour 4</td><td>DNS switch from GCP to Route 53</td></tr>
<tr>
<td>Hour 5–6</td><td>Traffic validation + rollback readiness</td></tr>
</tbody>
</table>
</div><hr />
<h2 id="heading-unexpected-issues-and-what-we-did">😮 Unexpected Issues (and What We Did)</h2>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Problem</td><td>Solution</td></tr>
</thead>
<tbody>
<tr>
<td><strong>MySQL master switch had lag</strong></td><td>Improved replica promotion playbook</td></tr>
<tr>
<td><strong>Hardcoded GCP configs found</strong></td><td>Emergency patching of ENV &amp; redeploy</td></tr>
<tr>
<td><strong>Solr slow to boot under load</strong></td><td>Temporarily pre-warmed EC2 nodes</td></tr>
</tbody>
</table>
</div><hr />
<h2 id="heading-post-migration-optimizations">🚀 Post-Migration Optimizations</h2>
<ul>
<li><p><strong>Rightsized EC2 instances</strong> using historical metrics</p>
</li>
<li><p>Committed to <strong>Savings Plans</strong> for reserved workloads</p>
</li>
<li><p>Enabled and tuned <strong>S3 lifecycle policies</strong></p>
</li>
<li><p>Set up <strong>automated AMI rotations</strong> and <strong>DB snapshots</strong></p>
</li>
</ul>
<hr />
<h2 id="heading-end-of-part-2-final-thoughts-amp-whats-next">🧠 <strong>End of Part 2: Final Thoughts &amp; What’s Next</strong></h2>
<p>This journey from GCP to AWS wasn’t just about swapping clouds—it was a <strong>masterclass in operational resilience</strong>, <strong>cross-team coordination</strong>, and <strong>cloud-native rethinking</strong>.</p>
<p>We learned that:</p>
<ul>
<li><p>No plan survives contact without flexibility.</p>
</li>
<li><p>Owning your infrastructure also means owning your edge cases.</p>
</li>
<li><p>Migration is more than lift-and-shift—it's evolve or expire.</p>
</li>
</ul>
<blockquote>
<p>“Smooth seas never made a skilled sailor.”<br />— <em>Franklin D. Roosevelt</em></p>
</blockquote>
<p>This migration tested our nerves and processes, but ultimately, it left us with better observability, tighter security, and an infrastructure we could proudly call production-grade.</p>
<hr />
<p>🔗 <strong>If this helped or resonated with you, connect with me on</strong> <a target="_blank" href="https://www.linkedin.com/in/sebastiancyril/"><strong>LinkedIn</strong></a>. Let’s learn and grow together.</p>
<p>👉 Stay tuned for more behind-the-scenes write-ups and system design breakdowns.</p>
<hr />
]]></content:encoded></item><item><title><![CDATA[GCP to AWS Migration – Part 1: Architecture, Data Transfer & Infrastructure Setup]]></title><description><![CDATA[🧭 Why We Migrated: Business Drivers Behind the Move
Our platform, serving millions of daily users, was running smoothly on GCP. However, evolving business goals, pricing considerations, and long-term cloud ecosystem alignment led us to migrate to AW...]]></description><link>https://tech.cyrilsebastian.com/gcp-to-aws-migration-part-1-architecture-data-transfer-and-infrastructure-setup</link><guid isPermaLink="true">https://tech.cyrilsebastian.com/gcp-to-aws-migration-part-1-architecture-data-transfer-and-infrastructure-setup</guid><category><![CDATA[AWS]]></category><category><![CDATA[GCP]]></category><category><![CDATA[Cloud]]></category><category><![CDATA[Devops]]></category><category><![CDATA[Infrastructure as code]]></category><dc:creator><![CDATA[Cyril Sebastian]]></dc:creator><pubDate>Tue, 03 Jun 2025 03:04:13 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1748919396131/6ec85e51-0333-4ad1-9d03-d9d8b4582494.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<hr />
<h2 id="heading-why-we-migrated-business-drivers-behind-the-move">🧭 Why We Migrated: Business Drivers Behind the Move</h2>
<p>Our platform, serving millions of daily users, was running smoothly on GCP. However, evolving business goals, pricing considerations, and long-term cloud ecosystem alignment led us to migrate to AWS.</p>
<p>Key components of our GCP-based stack:</p>
<ul>
<li><p><strong>Web Tier</strong>: Next.js frontend + Django backend</p>
</li>
<li><p><strong>Databases</strong>: MongoDB replica sets, MySQL clusters</p>
</li>
<li><p><strong>Asynchronous Services</strong>: Redis, RabbitMQ</p>
</li>
<li><p><strong>Search</strong>: Apache Solr for full-text search</p>
</li>
<li><p><strong>Infrastructure</strong>: GCP Compute Engine VMs, managed instance groups, HTTPS Load Balancer</p>
</li>
<li><p><strong>Storage</strong>: 21 TB of data in Google Cloud Storage (GCS)</p>
</li>
</ul>
<hr />
<h2 id="heading-step-0-creating-a-migration-runbook">📋 Step 0: Creating a Migration Runbook</h2>
<p>We treated this as a mission-critical project. Our runbook included:</p>
<ul>
<li><p><strong>Stakeholders</strong>: CTO, DevOps Lead, Database Architect, Application Owners</p>
</li>
<li><p><strong>Timeline</strong>: 8 weeks from planning to cutover</p>
</li>
<li><p><strong>Phases</strong>: Network Setup → Data Migration → Database Sync → Application Migration → Cutover</p>
</li>
<li><p><strong>Rollback Plan</strong>: Prepared and rehearsed with timelines for failback</p>
</li>
</ul>
<hr />
<h2 id="heading-infrastructure-mapping-gcp-vs-aws">🛠️ Infrastructure Mapping: GCP vs AWS</h2>
<p><strong>Challenges in Mapping:</strong></p>
<ul>
<li><p>GCP allows <em>custom CPU and RAM</em>, AWS uses fixed instance types (t3, m6i, r6g, etc)</p>
</li>
<li><p>IOPS differences between GCP SSDs and AWS EBS gp3 required tuning</p>
</li>
<li><p>Cost model varies significantly (especially egress from GCP)</p>
</li>
</ul>
<p>We used <a target="_blank" href="https://calculator.aws.amazon.com/">AWS Pricing Calculator</a> and <a target="_blank" href="https://cloud.google.com/products/calculator">GCP Pricing Calculator</a> to simulate monthly billing and select cost-optimized instance types.</p>
<hr />
<h2 id="heading-phase-1-aws-network-infrastructure-setup">🌐 Phase 1: AWS Network Infrastructure Setup</h2>
<h3 id="heading-aws-network-infrastructure-ap-south-1"><strong>AWS Network Infrastructure (ap-south-1)</strong></h3>
<pre><code class="lang-bash">                            ┌──────────────────────┐
                            │    GCP / DC VPC      │
                            └────────┬─────────────┘
                                     │
                           ┌─────────▼─────────┐
                           │ Site-to-Site VPN  │
                           └─────────┬─────────┘
                                     │
                           ┌─────────▼─────────┐
                           │      VPC          │
                           │  (ap-south-1)     │
                           └─────────┬─────────┘
                                     │
          ┌──────────────────────────┼─────────────────────────────┐
          │                          │                             │
┌─────────▼─────────┐     ┌──────────▼──────────┐       ┌──────────▼──────────┐
│ Public Subnet AZ1 │     │ Public Subnet AZ2   │       │ Public Subnet AZ3   │
│ - Bastion Host    │     │ - NAT Gateway       │       │ - Internet Gateway  │
└─────────┬─────────┘     └──────────┬──────────┘       └──────────┬──────────┘
          │                          │                             │
          ▼                          ▼                             ▼
┌────────────────┐        ┌────────────────┐              ┌────────────────┐
│Private Subnet 1│        │Private Subnet 2│              │Private Subnet 3│
│App / DB Tier   │        │App / DB Tier   │              │App / DB Tier   │
└────────────────┘        └────────────────┘              └────────────────┘

                 Security Groups + NACLs as per GCP mapping
                 VPC Flow Logs → CloudWatch Logs
</code></pre>
<hr />
<h3 id="heading-components-breakdown">🧩 Components Breakdown</h3>
<div class="hn-table">
<table>
<thead>
<tr>
<td>Component</td><td>Purpose</td></tr>
</thead>
<tbody>
<tr>
<td><strong>3 AZs</strong></td><td>High availability and fault tolerance</td></tr>
<tr>
<td><strong>Public Subnets</strong></td><td>Bastion, NAT, IGW for ingress/egress</td></tr>
<tr>
<td><strong>Private Subnets</strong></td><td>Isolated app and DB tiers</td></tr>
<tr>
<td><strong>VPN</strong></td><td>Secure hybrid GCP–AWS connectivity</td></tr>
<tr>
<td><strong>Security</strong></td><td>Security Groups + NACLs derived from GCP firewall rules</td></tr>
<tr>
<td><strong>Monitoring</strong></td><td>VPC Flow Logs + CloudWatch Metrics for visibility</td></tr>
</tbody>
</table>
</div><hr />
<h2 id="heading-phase-2-data-migration-gcs-s3">📦 Phase 2: Data Migration (GCS → S3)</h2>
<p>We migrated over <strong>21 TB of user-generated and application asset data</strong> from Google Cloud Storage (GCS) to Amazon S3. Given the scale, this phase required surgical precision in planning, execution, and cost control.</p>
<h3 id="heading-tools-amp-techniques-used">Tools &amp; Techniques Used</h3>
<ul>
<li><p><a target="_blank" href="https://docs.aws.amazon.com/datasync/latest/userguide/what-is-datasync.html"><strong>AWS DataSync</strong></a>:<br />  Chosen for its efficiency, security, and ability to handle large-scale object transfers.</p>
</li>
<li><p><strong>Service Account HMAC Credentials</strong>:<br />  Used for secure bucket-to-bucket authentication between GCP and AWS.</p>
</li>
<li><p><strong>Phased Sync Strategy</strong>:</p>
<ul>
<li><p><strong>Initial Full Sync</strong> — Began 3 weeks before cutover</p>
</li>
<li><p><strong>Delta Syncs</strong> — Repeated every 2–3 days</p>
</li>
<li><p><strong>Final Cutover Sync</strong> — During the 6-hour cutover window</p>
</li>
</ul>
</li>
</ul>
<blockquote>
<p><em>Note</em>: We carefully validated <strong>checksums and object counts</strong> after each sync phase to ensure data integrity and avoid overwriting unchanged files.</p>
</blockquote>
<hr />
<h3 id="heading-smart-optimization-decisions">💡 Smart Optimization Decisions</h3>
<ul>
<li><p><strong>Selective Data Migration</strong>:</p>
<ul>
<li><p>Identified several GCS buckets containing <strong>temporary compliance logs</strong> with auto-expiry policies.</p>
</li>
<li><p>Instead of migrating them and incurring egress charges, we chose to <strong>let them expire in GCP</strong>.</p>
</li>
<li><p>This alone <strong>saved several thousand dollars</strong> in unnecessary transfer costs.</p>
</li>
</ul>
</li>
<li><p><strong>Delta Awareness</strong>:</p>
<ul>
<li><p>Designed the sync jobs to be <strong>delta-aware</strong> to prevent redundant data movement.</p>
</li>
<li><p>Ensured that only <strong>modified/new objects</strong> were transferred during delta and final syncs.</p>
</li>
</ul>
</li>
</ul>
<h3 id="heading-post-migration-s3-tuning">Post-Migration S3 Tuning</h3>
<p>    After the bulk migration was completed, we fine-tuned our S3 environment for <strong>cost optimization, data hygiene</strong>, and <strong>long-term sustainability</strong>.</p>
<ul>
<li><p><strong>Lifecycle Policies Implemented</strong>:</p>
</li>
<li><p>Automatic archival of infrequently accessed data to <strong>S3 Glacier</strong>.</p>
</li>
<li><p>Expiry rules for:</p>
<ul>
<li><p>Temporary staging files.</p>
</li>
<li><p>Orphaned or abandoned data older than a defined threshold.</p>
</li>
</ul>
</li>
<li><p>Configured <strong>S3 Incomplete Multipart Upload Aborts</strong>:</p>
<ul>
<li>Any incomplete uploads are now <strong>automatically aborted after 7 days</strong>, preventing unnecessary storage billing from partial uploads caused by network or user errors.</li>
</ul>
</li>
</ul>
<hr />
<h3 id="heading-lessons-learned">Lessons Learned</h3>
<ul>
<li><p><strong>Data Volume ≠ Data Complexity</strong>:<br />  Even though we had tools for the job, coordinating syncs across staging, pre-prod, and production environments required careful orchestration and monitoring.</p>
</li>
<li><p><strong>Egress and DTO Costs</strong>:<br />  Data Transfer Out from GCP was a <strong>hidden but substantial cost center</strong>—plan ahead for this when budgeting.</p>
</li>
<li><p><strong>S3 Behavior Is Not GCS</strong>:<br />  We had to adjust application logic and IAM policies post-migration to align with <strong>S3 object handling, access policies, and permissions model</strong>.</p>
</li>
</ul>
<hr />
<h2 id="heading-phase-3-database-migration">🗄️ Phase 3: Database Migration</h2>
<h3 id="heading-mongodb-migration"><strong>MongoDB Migration</strong></h3>
<p>Migrating MongoDB from GCP to AWS was one of the most sensitive components of the move due to its role in powering real-time operations and user sessions.</p>
<p><strong>Our Strategy</strong>:</p>
<ul>
<li><p><strong>Replica Set Initialization</strong>: Set up MongoDB replica sets on AWS EC2 instances to mirror the topology running in GCP.</p>
</li>
<li><p><strong>Oplog-Based Sync</strong>: Enabled <strong>oplog-based replication</strong> between AWS and GCP MongoDB nodes to ensure near real-time data synchronization without full data dumps.</p>
</li>
<li><p><strong>Hybrid Node Integration</strong>: Deployed a <strong>MongoDB node in AWS</strong>, directly connected to the <strong>GCP replica set</strong>, acting as a bridge before full cutover.</p>
</li>
<li><p><strong>iptables for Controlled Access</strong>: Used <strong>iptables rules</strong> to restrict write access during the sync period. This allowed <strong>inter-DB synchronization traffic only</strong>, blocking application-level writes and ensuring data consistency before switchover.</p>
</li>
<li><p><strong>Failover Testing</strong>: Conducted multiple <strong>failover and promotion drills</strong> to validate readiness, with rollback plans in place.</p>
</li>
</ul>
<p><strong>Key Takeaway</strong>: Setting up a hybrid node and controlling access at the OS level allowed us to minimize data drift and test production-grade failovers without service disruption.</p>
<h3 id="heading-mysql-migration"><strong>MySQL Migration</strong></h3>
<p>The MySQL component required careful orchestration to ensure transactional consistency and minimal downtime.</p>
<p><strong>Our Approach</strong>:</p>
<ul>
<li><p><strong>Master-Slave Topology</strong>: Established a classic <strong>master-slave setup</strong> on AWS EC2 instances to replicate data from the GCP-hosted MySQL master.</p>
</li>
<li><p><strong>Replication Lag Challenges</strong>: One of the <strong>major blockers</strong> encountered was <strong>replication lag</strong> during promotion drills, especially under active write-heavy workloads.</p>
</li>
<li><p><strong>Controlled Write Freeze</strong>: We implemented <strong>iptables-based rules</strong> at the OS level to <strong>block application write traffic</strong>, allowing replication to catch up safely before cutover.</p>
</li>
<li><p><strong>Promotion Strategy</strong>:</p>
<ul>
<li><p>Executed a <strong>time-based cutover window</strong>.</p>
</li>
<li><p>Promoted the AWS slave node to master using a <strong>custom validation script</strong> to check replication offsets and ensure data integrity.</p>
</li>
<li><p>All secondary nodes were reconfigured to follow the <strong>new AWS master</strong>, ensuring consistency across the cluster.</p>
</li>
</ul>
</li>
</ul>
<p><strong>Key Takeaway</strong>: Blocking writes via iptables provided a clean buffer for promotion without the risk of in-flight transactions, making the cutover smooth and predictable.</p>
<hr />
<h2 id="heading-end-of-part-1-setting-the-stage-for-migration"><strong>End of Part 1: Setting the Stage for Migration</strong></h2>
<p>You’ve seen how we architected an AWS environment from scratch, replicated critical systems like MongoDB and MySQL, and seamlessly migrated over <strong>21 TB</strong> of assets from GCP to S3—all while optimizing for cost, security, and scalability.</p>
<p>But this was just the calm before the storm.</p>
<blockquote>
<p><strong>"Give me six hours to chop down a tree and I will spend the first four sharpening the axe."</strong><br />— <em>Abraham Lincoln</em></p>
</blockquote>
<p>We were well-prepared. But would the systems—and the team—hold up during <strong>live cutover</strong>?</p>
<p>In <a target="_blank" href="https://tech.cyrilsebastian.com/gcp-to-aws-migration-part-2-real-cutover-issues-and-recovery"><strong>Part 2: The Real Cutover &amp; Beyond</strong></a>, we’ll step into the fire:</p>
<ul>
<li><p>What went wrong,</p>
</li>
<li><p>What we had to patch live,</p>
</li>
<li><p>And what we did to walk away from it stronger.</p>
</li>
</ul>
<p>👉 Don't miss it. <strong>Follow me on</strong> <a target="_blank" href="https://www.linkedin.com/in/sebastiancyril/"><strong>LinkedIn</strong></a> for more deep-dive case studies and real-world DevOps/CloudOps stories like this.</p>
]]></content:encoded></item><item><title><![CDATA[Akamai Staging vs. Production: How to Set Up Environments Efficiently]]></title><description><![CDATA[Content Delivery Networks (CDNs) have become indispensable in modern web applications to ensure faster load times, enhanced security, and improved global performance. Among the leading CDN providers, Akamai stands out with its robust platform and mat...]]></description><link>https://tech.cyrilsebastian.com/akamai-staging-vs-production-how-to-set-up-environments-efficiently</link><guid isPermaLink="true">https://tech.cyrilsebastian.com/akamai-staging-vs-production-how-to-set-up-environments-efficiently</guid><category><![CDATA[akamai]]></category><category><![CDATA[CDN]]></category><dc:creator><![CDATA[Cyril Sebastian]]></dc:creator><pubDate>Thu, 26 Dec 2024 12:39:14 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1735211246728/db065802-0d74-4237-b579-5575deb072e7.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p><strong>Content Delivery Networks (CDNs)</strong> have become indispensable in modern web applications to ensure faster load times, enhanced security, and improved global performance. Among the leading CDN providers, <strong>Akamai</strong> stands out with its robust platform and mature features.</p>
<p>This article covers the basics of testing rules in a staging environment and deploying changes to the production environment after setting up rules in Akamai properties.</p>
<h2 id="heading-why-use-a-cdn">Why Use a CDN?</h2>
<p>A CDN is a distributed network of servers that caches content closer to the end users. Here are the key reasons to use a CDN:</p>
<ul>
<li><p><strong>Reduced Latency</strong>: By serving requests from the nearest edge servers, CDNs significantly decrease load times.</p>
</li>
<li><p><strong>Scalability</strong>: Seamlessly handles high traffic during peak usage by distributing the load.</p>
</li>
<li><p><strong>Enhanced Security</strong>: Protects against DDoS attacks, provides Web Application Firewalls (WAF), and mitigates other security threats.</p>
</li>
<li><p><strong>Cost Optimization</strong>: Reduces the burden on origin servers by caching static assets and efficiently managing traffic.</p>
</li>
</ul>
<h2 id="heading-best-practices-for-routing-requests-in-akamai">Best Practices for Routing Requests in Akamai.</h2>
<p>When configuring Akamai, follow these best practices for optimal results:</p>
<ol>
<li><p><strong>Use Appropriate Cache Keys</strong>: Ensure caching is configured for specific query strings, cookies, or headers to serve accurate content.</p>
</li>
<li><p><strong>Implement Cache Invalidation Policies</strong>: Use Akamai’s Content Purge to remove outdated content.</p>
<pre><code class="lang-bash"> akamai purge --edgerc ~/.edgerc --section default invalidate <span class="hljs-string">"www.example.com/path1"</span> <span class="hljs-string">"www.example.com/path2"</span>
</code></pre>
</li>
<li><p><strong>Enable Gzip or Brotli Compression</strong>: Compress text-based assets (e.g., HTML, CSS, JS) to reduce bandwidth usage.</p>
</li>
<li><p><strong>Strategic Traffic Routing</strong>: Utilize geo-blocking and geo-routing to enhance performance and minimize latency by routing users to the nearest server.</p>
</li>
<li><p><strong>Monitor Cache Hit/Miss Ratios</strong>: Use Akamai’s dashboard or logs to analyze and improve caching efficiency.</p>
</li>
</ol>
<h2 id="heading-setting-up-akamai-staging-and-production-environments">Setting Up Akamai Staging and Production Environments</h2>
<p>Akamai allows testing configurations in a staging environment before deploying them live. Here's how to set it up:</p>
<h4 id="heading-1-staging-environment-setup"><strong>1. Staging Environment Setup</strong></h4>
<p>The staging environment mimics production, ensuring safe testing. Follow these steps:</p>
<ol>
<li><p><strong>Determine the Staging URL</strong>: Add the Akamai “-staging” suffix to your Edge Hostname:</p>
<pre><code class="lang-bash"> example.com.edgesuite-staging.net
 example.com.edgekey-staging.net
</code></pre>
</li>
<li><p><strong>Perform DNS Lookup</strong>: Use <code>dig</code> commands to resolve IPv4 and IPv6 addresses from staging Edge Hostname:</p>
<pre><code class="lang-bash"> dig +short example.com.edgesuite-staging.net  
 dig +short example.com.edgesuite-staging.net AAAA
</code></pre>
</li>
<li><p><strong>Update</strong> <code>/etc/hosts</code>: Map the resolved IP address to the staging URL. Prefer IPv6 if IPv4 causes issues:</p>
<pre><code class="lang-bash"> 2600:XXXX:X:X::XXXX:XXXX example.com.edgesuite-staging.net
</code></pre>
</li>
<li><p><strong>Test Staging with</strong> <code>curl</code>: Confirm the request routes to Akamai’s staging servers:</p>
<pre><code class="lang-bash"> curl -vvv -L \ 
 -H <span class="hljs-string">'User-Agent: Safari: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0 Safari/605.1.15'</span> \
 -H <span class="hljs-string">"Pragma: akamai-x-cache-on"</span> -H <span class="hljs-string">"Pragma: akamai-x-check-cacheable"</span> -H <span class="hljs-string">"x-akamai-internal: true"</span> -H <span class="hljs-string">"Pragma: akamai-x-cache-remote-on"</span> \ 
 -H <span class="hljs-string">"Pragma: akamai-x-get-cache-key"</span> -H <span class="hljs-string">"Pragma: akamai-x-get-extracted-values"</span> -H <span class="hljs-string">"Pragma: akamai-x-get-ssl-client-session-id"</span> \
 -H <span class="hljs-string">"Pragma: akamai-x-get-true-cache-key"</span> -H <span class="hljs-string">"Pragma: akamai-x-serial-no"</span> -H <span class="hljs-string">"Pragma: akamai-x-get-request-id"</span> -H <span class="hljs-string">"Pragma: X-Akamai-CacheTrack"</span> \
 -I <span class="hljs-string">"https://example.com.edgesuite-staging.net"</span>
</code></pre>
</li>
</ol>
<h4 id="heading-2-verifying-akamai-requests"><strong>2. Verifying Akamai Requests</strong></h4>
<ul>
<li><p><strong>Response Headers</strong>: Check for Akamai-specific headers:</p>
<ul>
<li><p><code>X-Cache: TCP_HIT</code> (served from cache)</p>
</li>
<li><p><code>X-Cache: TCP_MISS</code> (fetched from origin server)</p>
</li>
<li><p><code>X-Akamai-Staging: ESSL</code> (indicates staging environment)</p>
</li>
</ul>
</li>
<li><p><strong>Browser Developer Tools</strong>:</p>
<ul>
<li><p>Open DevTools (Ctrl+Shift+I or Cmd+Option+I).</p>
</li>
<li><p>Inspect response headers under the <strong>Network</strong> tab for Akamai-specific details.</p>
</li>
</ul>
</li>
</ul>
<h4 id="heading-3-deploying-to-production"><strong>3. Deploying to Production</strong></h4>
<p>Once testing is complete, promote the configuration to production via Akamai’s property manager. Activating the changes ensures they are live without downtime.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1735201694214/ec2ae931-e202-4443-91b6-72a1ea0cae45.png" alt class="image--center mx-auto" /></p>
<hr />
<h3 id="heading-conclusion"><strong>Conclusion</strong></h3>
<p>Akamai's staging and production environments simplify the process of testing and deploying secure, high-performing web applications. By adhering to best practices like optimizing caching, compressing assets, and leveraging Akamai’s robust tools, you can ensure your web application delivers a seamless experience to users globally.</p>
]]></content:encoded></item><item><title><![CDATA[init(): Setting Up My Tech Journey.]]></title><description><![CDATA[Learning is complete when it’s shared, and there’s immense joy in passing on that knowledge—it’s deeply satisfying.
Welcome to my space! Here, I’ll share my journey, lessons, and challenges in the ever-evolving world of DevOps and Site Reliability En...]]></description><link>https://tech.cyrilsebastian.com/init-setting-up-my-tech-journey</link><guid isPermaLink="true">https://tech.cyrilsebastian.com/init-setting-up-my-tech-journey</guid><category><![CDATA[introduction]]></category><dc:creator><![CDATA[Cyril Sebastian]]></dc:creator><pubDate>Sun, 01 Dec 2024 18:01:05 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1733075273008/3a506e7c-2d29-4812-a568-976b75d2624a.jpeg" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Learning is complete when it’s shared, and there’s immense joy in passing on that knowledge—it’s deeply satisfying.</p>
<p>Welcome to my space! Here, I’ll share my journey, lessons, and challenges in the ever-evolving world of DevOps and Site Reliability Engineering (SRE). But let me tell you upfront—this blog isn’t limited to just these two topics.</p>
<p>You’ll find a blend of my <a target="_blank" href="https://blog.cyrilsebastian.com/p/tracking-my-pulse-conquering-challenges-and-mastering-life-s-juggle">day-to-day experiences</a>, lessons learned, and strategies for navigating the tech landscape. The goal is to grow into a better version of myself while keeping pace with the relentless updates in technology.</p>
<blockquote>
<p>"The best way to predict the future is to create it."</p>
<p>– Abraham Lincoln</p>
</blockquote>
<p>This blog is my small effort to create a more collaborative, informed, and innovative tech community.</p>
<p>Starting this blog is both exciting and nerve-wracking. But as they say in SRE, “If it’s hard, do it often. If it’s easy, automate it.” Writing may not be code, but I’m committing to doing it often, learning as I go, and hopefully inspiring a few people along the way.</p>
<p><a target="_blank" href="https://cyrilsebastian.com/">Stay Curious! Stay tuned!</a></p>
]]></content:encoded></item></channel></rss>