Keep It Running: Email Infrastructure Operations Guide
Your infrastructure is live. This guide shows you how to monitor, troubleshoot, and scale your Postfix + AWS SES system without the headaches.

I’m Cyril Sebastian, a DevOps and Cloud Infrastructure architect with 10+ years of experience building, scaling, and securing cloud-native and hybrid systems. I specialize in automation, cost optimization, observability, and platform engineering across AWS, GCP, and Oracle Cloud. My passion lies in solving complex infrastructure challenges—from cloud migrations to Infrastructure as Code (IaC), and from deployment automation to scalable monitoring strategies. I blog here about:
Cloud strategy and migration playbooks Real-world DevOps and automation with Terraform, Jenkins, and Ansible DevSecOps practices and security-first thinking in production Monitoring, cost optimization, and incident response at scale
If you're building in the cloud, optimizing infra, or exploring DevOps culture—let’s connect and share ideas! 🔗 linkedin.com/in/sebastiancyril
Making email infrastructure a pro.
Daily Operations (5 Minutes)
The Morning Health Check
Create this script and run it daily:
cat > ~/email-health.sh <<'EOF'
#!/bin/bash
YESTERDAY=$(date -d "yesterday" +"%b %d")
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "Email Health ($YESTERDAY)"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
# Service status
systemctl is-active --quiet postfix && echo "Postfix" || echo "Postfix DOWN"
systemctl is-active --quiet ses-logger && echo "Logger" || echo "Logger DOWN"
# Email stats
SENT=\((grep "\)YESTERDAY" /var/log/postfix/postfix.log 2>/dev/null | grep -c "status=sent")
DELIVERED=\((grep "\)YESTERDAY" /var/log/postfix/mail.log 2>/dev/null | grep -c "status=delivered")
BOUNCED=\((grep "\)YESTERDAY" /var/log/postfix/mail.log 2>/dev/null | grep -c "status=bounced")
echo ""
echo "📊 Volume"
echo " Sent: $SENT"
echo " Delivered: $DELIVERED"
echo " Bounced: $BOUNCED"
if [ $SENT -gt 0 ]; then
DELIVERY_RATE=$((DELIVERED * 100 / SENT))
BOUNCE_RATE=$((BOUNCED * 100 / SENT))
echo ""
echo "📈 Rates"
echo " Delivery: ${DELIVERY_RATE}%"
echo " Bounce: ${BOUNCE_RATE}%"
[ $BOUNCE_RATE -gt 5 ] && echo " ⚠️ High bounce rate!"
fi
# Queue status
QUEUE=\((mailq | tail -1 | awk '{print \)5}')
[ "\(QUEUE" = "empty" ] && echo "" && echo "Queue empty" || echo "" && echo "⚠️ Queue: \)QUEUE messages"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
EOF
chmod +x ~/email-health.sh
Run it:
./email-health.sh
Automate it (runs at 9 AM, emails you results):
(crontab -l 2>/dev/null; echo "0 9 * * * ~/email-health.sh | mail -s 'Email Health Report' admin@yourdomain.com") | crontab -
Essential Monitoring
Real-Time Log Watching
Monitor live email flow:
# Watch everything
sudo tail -f /var/log/postfix/*.log
# Watch only delivered emails
sudo tail -f /var/log/postfix/mail.log | grep --line-buffered "status=delivered"
# Watch bounces
sudo tail -f /var/log/postfix/mail.log | grep --line-buffered "status=bounced"
Key Metrics to Track
| Metric | Target | Alert If |
|---|---|---|
| Delivery Rate | >95% | <90% |
| Bounce Rate | <5% | >10% |
| Queue Size | 0 | >100 |
| Service Uptime | 99.9% | Any downtime |
Quick metric checks:
# Today's delivery rate
SENT=\((grep "\)(date +%b\ %d)" /var/log/postfix/postfix.log | grep -c "status=sent")
DELIVERED=\((grep "\)(date +%b\ %d)" /var/log/postfix/mail.log | grep -c "status=delivered")
echo "Delivery rate: $((DELIVERED * 100 / SENT))%"
# Average delivery time (milliseconds)
grep "status=delivered" /var/log/postfix/mail.log | \
grep -oP 'delay=\K\d+' | \
awk '{sum+=$1; n++} END {print "Avg delay: " sum/n "ms"}'
# Top recipient domains
grep "status=delivered" /var/log/postfix/mail.log | \
grep -oP 'to=<[^@]+@\K[^>]+' | \
sort | uniq -c | sort -rn | head -5
Common Operations
Adding New Senders
# 1. Edit whitelist
sudo vim /etc/postfix/allowed_senders
# Add line:
# newsender@yourdomain.com OK
# 2. Rebuild database
sudo postmap /etc/postfix/allowed_senders
# 3. Reload (no restart needed!)
sudo systemctl reload postfix
# 4. Test
echo "Test" | mail -s "Test" -r newsender@yourdomain.com test@example.com
No downtime! Reload picks up changes instantly.
Removing Senders
# 1. Comment out or remove from whitelist
sudo vim /etc/postfix/allowed_senders
# #oldsender@yourdomain.com OK
# 2. Rebuild and reload
sudo postmap /etc/postfix/allowed_senders
sudo systemctl reload postfix
# 3. Verify rejection
echo "Test" | mail -s "Test" -r oldsender@yourdomain.com test@example.com
# Should see: "Sender address rejected"
Managing Mail Queue
View queue:
mailq
Flush queue (retry all deferred emails):
sudo postqueue -f
Delete specific email:
# Get queue ID from mailq
sudo postsuper -d QUEUE_ID
Delete all queued emails:
sudo postsuper -d ALL
Delete only deferred emails:
sudo postsuper -d ALL deferred
Searching Email History
Find specific email:
grep "user@example.com" /var/log/postfix/*.log
Find by sender:
grep "from=<sender@yourdomain.com>" /var/log/postfix/postfix.log
Find bounces to specific domain:
grep "gmail.com" /var/log/postfix/mail.log | grep "bounced"
Get complete email journey:
# Get message ID from sent log
MSG_ID=$(grep "user@example.com" /var/log/postfix/postfix.log | grep -oP 'status=sent \(250 Ok \K[^)]+' | head -1)
# Find all events for that message
grep "$MSG_ID" /var/log/postfix/*.log
Troubleshooting Guide
Problem 1: Postfix Won't Start
Symptoms:
sudo systemctl start postfix
# Job for postfix.service failed
Fix:
# 1. Check config syntax
sudo postfix check
# 2. View detailed error
sudo journalctl -u postfix -n 20 --no-pager
# 3. Common issues:
# Port in use?
sudo lsof -i :25
# Kill conflicting process: sudo systemctl stop sendmail
# Permission issue?
sudo chown -R postfix:postfix /var/log/postfix
sudo chown -R postfix:postfix /var/spool/postfix
# Check line number from 'postfix check' output
sudo vim /etc/postfix/main.cf +LINE_NUMBER
Problem 2: Emails Stuck in Queue
Diagnosis:
mailq # Shows queued emails
sudo tail -100 /var/log/postfix/postfix.log | grep "status=deferred"
Common causes and fixes:
Wrong SES credentials:
# Verify credentials
sudo postmap -q "[email-smtp.ap-south-1.amazonaws.com]:587" /etc/postfix/sasl_passwd
# Update if needed
sudo vim /etc/postfix/sasl_passwd
sudo postmap /etc/postfix/sasl_passwd
sudo systemctl restart postfix
Network blocked:
# Test SES connectivity
telnet email-smtp.ap-south-1.amazonaws.com 587
# Check security group allows outbound 587
# Check route table has internet gateway
SES quota exceeded:
aws ses get-send-quota --region ap-south-1
# If near limit, wait or request increase
After fixing, flush the queue:
sudo postqueue -f
Problem 3: Logger Service Keeps Crashing
Check logs:
sudo journalctl -u ses-logger -n 50 --no-pager
sudo tail -50 /var/log/ses-logger-error.log
Common fixes:
boto3 missing:
python3 -c "import boto3" || sudo yum install -y python3-boto3
sudo systemctl restart ses-logger
Wrong queue URL:
# Get correct URL
QUEUE_URL=$(aws sqs get-queue-url --queue-name ses-events-queue --region ap-south-1 --query 'QueueUrl' --output text)
# Update service
sudo sed -i "s|Environment=\"SQS_QUEUE_URL=.*\"|Environment=\"SQS_QUEUE_URL=$QUEUE_URL\"|" /etc/systemd/system/ses-logger.service
sudo systemctl daemon-reload
sudo systemctl restart ses-logger
IAM permissions:
# Verify role attached
aws sts get-caller-identity
# Should show: PostfixSESLogger role
# If not, reattach IAM instance profile
Problem 4: No Delivery Events in Logs
Diagnosis:
# 1. Check SQS queue has messages
aws sqs get-queue-attributes \
--queue-url "$(aws sqs get-queue-url --queue-name ses-events-queue --region ap-south-1 --query 'QueueUrl' --output text)" \
--attribute-names ApproximateNumberOfMessages \
--region ap-south-1
If messages are accumulating:
Logger not processing → Check
sudo journalctl -u ses-loggerRestart logger →
sudo systemctl restart ses-logger
If no messages in queue:
# 2. Verify SES publishing to SNS
aws ses get-identity-notification-attributes \
--identities yourdomain.com \
--region ap-south-1
# Should show all three topics configured
# 3. Reconfigure if needed
SNS_ARN=$(aws sns list-topics --region ap-south-1 --query "Topics[?contains(TopicArn, 'ses-events-topic')].TopicArn | [0]" --output text)
for EVENT in Delivery Bounce Complaint; do
aws ses set-identity-notification-topic \
--identity yourdomain.com \
--notification-type $EVENT \
--sns-topic "$SNS_ARN" \
--region ap-south-1
done
Problem 5: High Bounce Rate (>10%)
Analyze bounce reasons:
grep "status=bounced" /var/log/postfix/mail.log | \
grep -oP 'reason=\(\K[^\)]+' | \
sort | uniq -c | sort -rn | head -10
Common reasons:
"User unknown" (invalid addresses):
# Extract bounced addresses
grep "status=bounced" /var/log/postfix/mail.log | \
grep "bounce_type=Permanent" | \
grep -oP 'to=<\K[^>]+' | \
sort -u > bounced_addresses.txt
# Remove from your mailing list
"Mailbox full":
Temporary issue, will resolve
Retry after 24 hours
"550 Spam":
Review email content
Check SPF/DKIM/DMARC setup
Verify sender reputation
Problem 6: Emails Going to Spam
Verification checklist:
# 1. Check SPF
dig +short TXT yourdomain.com | grep spf
# Should include: include:amazonses.com
# 2. Check DKIM
aws ses get-identity-dkim-attributes \
--identities yourdomain.com \
--region ap-south-1
# Should show: DkimEnabled=true, Status=Success
# 3. Check DMARC
dig +short TXT _dmarc.yourdomain.com
# Should return DMARC policy
# 4. Check SES reputation
aws ses get-account-sending-enabled --region ap-south-1
# Should be enabled
Content checklist:
Avoid spam trigger words (FREE!, ACT NOW!)
Include unsubscribe link
Balance text/image ratio (60% text minimum)
Use a consistent "From" name and address
Authenticate with SPF/DKIM/DMARC
Performance Optimization
Postfix Tuning
For higher throughput:
sudo vim /etc/postfix/main.cf
Add/update:
# Increase concurrent deliveries
default_destination_concurrency_limit = 50
default_destination_recipient_limit = 50
# Reduce queue lifetime
maximal_queue_lifetime = 1d
bounce_queue_lifetime = 1d
# Connection caching
smtp_connection_cache_on_demand = yes
smtp_connection_cache_destinations = email-smtp.ap-south-1.amazonaws.com
Reload:
sudo systemctl reload postfix
Logger Optimization
For high volume (>1000 events/min):
Edit /usr/local/bin/ses_logger.py:
# Increase batch size
response = sqs.receive_message(
QueueUrl=queue_url,
MaxNumberOfMessages=30, # Up from 10
WaitTimeSeconds=20
)
Restart:
sudo systemctl restart ses-logger
Scaling Strategies
When to Scale
| Metric | Scale Trigger |
|---|---|
| CPU Usage | Sustained >70% |
| Emails/day | >40,000 (80% of quota) |
| Queue Size | Sustained >100 |
| Memory | >80% used |
Vertical Scaling (Bigger Instance)
Current performance by instance:
| Instance | vCPU | RAM | Emails/day |
|---|---|---|---|
| t3a.small | 2 | 2GB | 10,000 |
| t3a.medium | 2 | 4GB | 50,000 |
| t3a.large | 2 | 8GB | 100,000 |
| c6a.xlarge | 4 | 8GB | 500,000 |
Security Hardening
Restrict Relay Access
Tighten network access:
sudo vim /etc/postfix/main.cf
# Only specific IPs
mynetworks = 127.0.0.1, 10.10.3.125
# Or specific subnet
mynetworks = 10.10.0.0/21
Rate Limiting
Prevent abuse:
sudo vim /etc/postfix/main.cf
# Max 100 connections/min per client
smtpd_client_connection_rate_limit = 100
# Max 100 emails/min per client
smtpd_client_message_rate_limit = 100
Monitor IAM Usage
Enable CloudTrail for audit:
aws cloudtrail create-trail \
--name email-infrastructure-audit \
--s3-bucket-name my-audit-logs
Resources
AWS Documentation:
Postfix:
Series Complete! 🎉
Part 3: Operations ← . You just finished this
🔗 If this helped or resonated with you, connect with me on LinkedIn. Let’s learn and grow together.
👉 Stay tuned for more behind-the-scenes write-ups and system design breakdowns.



