Skip to main content

Command Palette

Search for a command to run...

Keep It Running: Email Infrastructure Operations Guide

Your infrastructure is live. This guide shows you how to monitor, troubleshoot, and scale your Postfix + AWS SES system without the headaches.

Published
9 min read
Keep It Running: Email Infrastructure Operations Guide
C

I’m Cyril Sebastian, a DevOps and Cloud Infrastructure architect with 10+ years of experience building, scaling, and securing cloud-native and hybrid systems. I specialize in automation, cost optimization, observability, and platform engineering across AWS, GCP, and Oracle Cloud. My passion lies in solving complex infrastructure challenges—from cloud migrations to Infrastructure as Code (IaC), and from deployment automation to scalable monitoring strategies. I blog here about:

Cloud strategy and migration playbooks Real-world DevOps and automation with Terraform, Jenkins, and Ansible DevSecOps practices and security-first thinking in production Monitoring, cost optimization, and incident response at scale

If you're building in the cloud, optimizing infra, or exploring DevOps culture—let’s connect and share ideas! 🔗 linkedin.com/in/sebastiancyril

Making email infrastructure a pro.


Daily Operations (5 Minutes)

The Morning Health Check

Create this script and run it daily:

cat > ~/email-health.sh <<'EOF'
#!/bin/bash
YESTERDAY=$(date -d "yesterday" +"%b %d")

echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "Email Health ($YESTERDAY)"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━"

# Service status
systemctl is-active --quiet postfix && echo "Postfix" || echo "Postfix DOWN"
systemctl is-active --quiet ses-logger && echo "Logger" || echo "Logger DOWN"

# Email stats
SENT=\((grep "\)YESTERDAY" /var/log/postfix/postfix.log 2>/dev/null | grep -c "status=sent")
DELIVERED=\((grep "\)YESTERDAY" /var/log/postfix/mail.log 2>/dev/null | grep -c "status=delivered")
BOUNCED=\((grep "\)YESTERDAY" /var/log/postfix/mail.log 2>/dev/null | grep -c "status=bounced")

echo ""
echo "📊 Volume"
echo "   Sent: $SENT"
echo "   Delivered: $DELIVERED"
echo "   Bounced: $BOUNCED"

if [ $SENT -gt 0 ]; then
  DELIVERY_RATE=$((DELIVERED * 100 / SENT))
  BOUNCE_RATE=$((BOUNCED * 100 / SENT))
  echo ""
  echo "📈 Rates"
  echo "   Delivery: ${DELIVERY_RATE}%"
  echo "   Bounce: ${BOUNCE_RATE}%"
  
  [ $BOUNCE_RATE -gt 5 ] && echo "   ⚠️  High bounce rate!"
fi

# Queue status
QUEUE=\((mailq | tail -1 | awk '{print \)5}')
[ "\(QUEUE" = "empty" ] && echo "" && echo "Queue empty" || echo "" && echo "⚠️  Queue: \)QUEUE messages"

echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
EOF

chmod +x ~/email-health.sh

Run it:

./email-health.sh

Automate it (runs at 9 AM, emails you results):

(crontab -l 2>/dev/null; echo "0 9 * * * ~/email-health.sh | mail -s 'Email Health Report' admin@yourdomain.com") | crontab -

Essential Monitoring

Real-Time Log Watching

Monitor live email flow:

# Watch everything
sudo tail -f /var/log/postfix/*.log

# Watch only delivered emails
sudo tail -f /var/log/postfix/mail.log | grep --line-buffered "status=delivered"

# Watch bounces
sudo tail -f /var/log/postfix/mail.log | grep --line-buffered "status=bounced"

Key Metrics to Track

Metric Target Alert If
Delivery Rate >95% <90%
Bounce Rate <5% >10%
Queue Size 0 >100
Service Uptime 99.9% Any downtime

Quick metric checks:

# Today's delivery rate
SENT=\((grep "\)(date +%b\ %d)" /var/log/postfix/postfix.log | grep -c "status=sent")
DELIVERED=\((grep "\)(date +%b\ %d)" /var/log/postfix/mail.log | grep -c "status=delivered")
echo "Delivery rate: $((DELIVERED * 100 / SENT))%"

# Average delivery time (milliseconds)
grep "status=delivered" /var/log/postfix/mail.log | \
  grep -oP 'delay=\K\d+' | \
  awk '{sum+=$1; n++} END {print "Avg delay: " sum/n "ms"}'

# Top recipient domains
grep "status=delivered" /var/log/postfix/mail.log | \
  grep -oP 'to=<[^@]+@\K[^>]+' | \
  sort | uniq -c | sort -rn | head -5

Common Operations

Adding New Senders

# 1. Edit whitelist
sudo vim /etc/postfix/allowed_senders

# Add line:
# newsender@yourdomain.com    OK

# 2. Rebuild database
sudo postmap /etc/postfix/allowed_senders

# 3. Reload (no restart needed!)
sudo systemctl reload postfix

# 4. Test
echo "Test" | mail -s "Test" -r newsender@yourdomain.com test@example.com

No downtime! Reload picks up changes instantly.


Removing Senders

# 1. Comment out or remove from whitelist
sudo vim /etc/postfix/allowed_senders
# #oldsender@yourdomain.com    OK

# 2. Rebuild and reload
sudo postmap /etc/postfix/allowed_senders
sudo systemctl reload postfix

# 3. Verify rejection
echo "Test" | mail -s "Test" -r oldsender@yourdomain.com test@example.com
# Should see: "Sender address rejected"

Managing Mail Queue

View queue:

mailq

Flush queue (retry all deferred emails):

sudo postqueue -f

Delete specific email:

# Get queue ID from mailq
sudo postsuper -d QUEUE_ID

Delete all queued emails:

sudo postsuper -d ALL

Delete only deferred emails:

sudo postsuper -d ALL deferred

Searching Email History

Find specific email:

grep "user@example.com" /var/log/postfix/*.log

Find by sender:

grep "from=<sender@yourdomain.com>" /var/log/postfix/postfix.log

Find bounces to specific domain:

grep "gmail.com" /var/log/postfix/mail.log | grep "bounced"

Get complete email journey:

# Get message ID from sent log
MSG_ID=$(grep "user@example.com" /var/log/postfix/postfix.log | grep -oP 'status=sent \(250 Ok \K[^)]+' | head -1)

# Find all events for that message
grep "$MSG_ID" /var/log/postfix/*.log

Troubleshooting Guide

Problem 1: Postfix Won't Start

Symptoms:

sudo systemctl start postfix
# Job for postfix.service failed

Fix:

# 1. Check config syntax
sudo postfix check

# 2. View detailed error
sudo journalctl -u postfix -n 20 --no-pager

# 3. Common issues:

# Port in use?
sudo lsof -i :25
# Kill conflicting process: sudo systemctl stop sendmail

# Permission issue?
sudo chown -R postfix:postfix /var/log/postfix
sudo chown -R postfix:postfix /var/spool/postfix

# Check line number from 'postfix check' output
sudo vim /etc/postfix/main.cf +LINE_NUMBER

Problem 2: Emails Stuck in Queue

Diagnosis:

mailq  # Shows queued emails
sudo tail -100 /var/log/postfix/postfix.log | grep "status=deferred"

Common causes and fixes:

Wrong SES credentials:

# Verify credentials
sudo postmap -q "[email-smtp.ap-south-1.amazonaws.com]:587" /etc/postfix/sasl_passwd

# Update if needed
sudo vim /etc/postfix/sasl_passwd
sudo postmap /etc/postfix/sasl_passwd
sudo systemctl restart postfix

Network blocked:

# Test SES connectivity
telnet email-smtp.ap-south-1.amazonaws.com 587

# Check security group allows outbound 587
# Check route table has internet gateway

SES quota exceeded:

aws ses get-send-quota --region ap-south-1
# If near limit, wait or request increase

After fixing, flush the queue:

sudo postqueue -f

Problem 3: Logger Service Keeps Crashing

Check logs:

sudo journalctl -u ses-logger -n 50 --no-pager
sudo tail -50 /var/log/ses-logger-error.log

Common fixes:

boto3 missing:

python3 -c "import boto3" || sudo yum install -y python3-boto3
sudo systemctl restart ses-logger

Wrong queue URL:

# Get correct URL
QUEUE_URL=$(aws sqs get-queue-url --queue-name ses-events-queue --region ap-south-1 --query 'QueueUrl' --output text)

# Update service
sudo sed -i "s|Environment=\"SQS_QUEUE_URL=.*\"|Environment=\"SQS_QUEUE_URL=$QUEUE_URL\"|" /etc/systemd/system/ses-logger.service

sudo systemctl daemon-reload
sudo systemctl restart ses-logger

IAM permissions:

# Verify role attached
aws sts get-caller-identity

# Should show: PostfixSESLogger role
# If not, reattach IAM instance profile

Problem 4: No Delivery Events in Logs

Diagnosis:

# 1. Check SQS queue has messages
aws sqs get-queue-attributes \
  --queue-url "$(aws sqs get-queue-url --queue-name ses-events-queue --region ap-south-1 --query 'QueueUrl' --output text)" \
  --attribute-names ApproximateNumberOfMessages \
  --region ap-south-1

If messages are accumulating:

  • Logger not processing → Check sudo journalctl -u ses-logger

  • Restart logger → sudo systemctl restart ses-logger

If no messages in queue:

# 2. Verify SES publishing to SNS
aws ses get-identity-notification-attributes \
  --identities yourdomain.com \
  --region ap-south-1

# Should show all three topics configured

# 3. Reconfigure if needed
SNS_ARN=$(aws sns list-topics --region ap-south-1 --query "Topics[?contains(TopicArn, 'ses-events-topic')].TopicArn | [0]" --output text)

for EVENT in Delivery Bounce Complaint; do
  aws ses set-identity-notification-topic \
    --identity yourdomain.com \
    --notification-type $EVENT \
    --sns-topic "$SNS_ARN" \
    --region ap-south-1
done

Problem 5: High Bounce Rate (>10%)

Analyze bounce reasons:

grep "status=bounced" /var/log/postfix/mail.log | \
  grep -oP 'reason=\(\K[^\)]+' | \
  sort | uniq -c | sort -rn | head -10

Common reasons:

"User unknown" (invalid addresses):

# Extract bounced addresses
grep "status=bounced" /var/log/postfix/mail.log | \
  grep "bounce_type=Permanent" | \
  grep -oP 'to=<\K[^>]+' | \
  sort -u > bounced_addresses.txt

# Remove from your mailing list

"Mailbox full":

  • Temporary issue, will resolve

  • Retry after 24 hours

"550 Spam":

  • Review email content

  • Check SPF/DKIM/DMARC setup

  • Verify sender reputation


Problem 6: Emails Going to Spam

Verification checklist:

# 1. Check SPF
dig +short TXT yourdomain.com | grep spf
# Should include: include:amazonses.com

# 2. Check DKIM
aws ses get-identity-dkim-attributes \
  --identities yourdomain.com \
  --region ap-south-1
# Should show: DkimEnabled=true, Status=Success

# 3. Check DMARC
dig +short TXT _dmarc.yourdomain.com
# Should return DMARC policy

# 4. Check SES reputation
aws ses get-account-sending-enabled --region ap-south-1
# Should be enabled

Content checklist:

  • Avoid spam trigger words (FREE!, ACT NOW!)

  • Include unsubscribe link

  • Balance text/image ratio (60% text minimum)

  • Use a consistent "From" name and address

  • Authenticate with SPF/DKIM/DMARC


Performance Optimization

Postfix Tuning

For higher throughput:

sudo vim /etc/postfix/main.cf

Add/update:

# Increase concurrent deliveries
default_destination_concurrency_limit = 50
default_destination_recipient_limit = 50

# Reduce queue lifetime
maximal_queue_lifetime = 1d
bounce_queue_lifetime = 1d

# Connection caching
smtp_connection_cache_on_demand = yes
smtp_connection_cache_destinations = email-smtp.ap-south-1.amazonaws.com

Reload:

sudo systemctl reload postfix

Logger Optimization

For high volume (>1000 events/min):

Edit /usr/local/bin/ses_logger.py:

# Increase batch size
response = sqs.receive_message(
    QueueUrl=queue_url,
    MaxNumberOfMessages=30,  # Up from 10
    WaitTimeSeconds=20
)

Restart:

sudo systemctl restart ses-logger

Scaling Strategies

When to Scale

Metric Scale Trigger
CPU Usage Sustained >70%
Emails/day >40,000 (80% of quota)
Queue Size Sustained >100
Memory >80% used

Vertical Scaling (Bigger Instance)

Current performance by instance:

Instance vCPU RAM Emails/day
t3a.small 2 2GB 10,000
t3a.medium 2 4GB 50,000
t3a.large 2 8GB 100,000
c6a.xlarge 4 8GB 500,000

Security Hardening

Restrict Relay Access

Tighten network access:

sudo vim /etc/postfix/main.cf
# Only specific IPs
mynetworks = 127.0.0.1, 10.10.3.125

# Or specific subnet
mynetworks = 10.10.0.0/21

Rate Limiting

Prevent abuse:

sudo vim /etc/postfix/main.cf
# Max 100 connections/min per client
smtpd_client_connection_rate_limit = 100

# Max 100 emails/min per client
smtpd_client_message_rate_limit = 100

Monitor IAM Usage

Enable CloudTrail for audit:

aws cloudtrail create-trail \
  --name email-infrastructure-audit \
  --s3-bucket-name my-audit-logs

Resources

AWS Documentation:

Postfix:


Series Complete! 🎉

🔗 If this helped or resonated with you, connect with me on LinkedIn. Let’s learn and grow together.

👉 Stay tuned for more behind-the-scenes write-ups and system design breakdowns.