# Keep It Running: Email Infrastructure Operations Guide

Making email infrastructure a pro.

* * *

## Daily Operations (5 Minutes)

### The Morning Health Check

Create this script and run it daily:

```bash
cat > ~/email-health.sh <<'EOF'
#!/bin/bash
YESTERDAY=$(date -d "yesterday" +"%b %d")

echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "Email Health ($YESTERDAY)"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━"

# Service status
systemctl is-active --quiet postfix && echo "Postfix" || echo "Postfix DOWN"
systemctl is-active --quiet ses-logger && echo "Logger" || echo "Logger DOWN"

# Email stats
SENT=$(grep "$YESTERDAY" /var/log/postfix/postfix.log 2>/dev/null | grep -c "status=sent")
DELIVERED=$(grep "$YESTERDAY" /var/log/postfix/mail.log 2>/dev/null | grep -c "status=delivered")
BOUNCED=$(grep "$YESTERDAY" /var/log/postfix/mail.log 2>/dev/null | grep -c "status=bounced")

echo ""
echo "📊 Volume"
echo "   Sent: $SENT"
echo "   Delivered: $DELIVERED"
echo "   Bounced: $BOUNCED"

if [ $SENT -gt 0 ]; then
  DELIVERY_RATE=$((DELIVERED * 100 / SENT))
  BOUNCE_RATE=$((BOUNCED * 100 / SENT))
  echo ""
  echo "📈 Rates"
  echo "   Delivery: ${DELIVERY_RATE}%"
  echo "   Bounce: ${BOUNCE_RATE}%"
  
  [ $BOUNCE_RATE -gt 5 ] && echo "   ⚠️  High bounce rate!"
fi

# Queue status
QUEUE=$(mailq | tail -1 | awk '{print $5}')
[ "$QUEUE" = "empty" ] && echo "" && echo "Queue empty" || echo "" && echo "⚠️  Queue: $QUEUE messages"

echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
EOF

chmod +x ~/email-health.sh
```

**Run it:**

```bash
./email-health.sh
```

**Automate it** (runs at 9 AM, emails you results):

```bash
(crontab -l 2>/dev/null; echo "0 9 * * * ~/email-health.sh | mail -s 'Email Health Report' admin@yourdomain.com") | crontab -
```

* * *

## Essential Monitoring

### Real-Time Log Watching

**Monitor live email flow:**

```bash
# Watch everything
sudo tail -f /var/log/postfix/*.log

# Watch only delivered emails
sudo tail -f /var/log/postfix/mail.log | grep --line-buffered "status=delivered"

# Watch bounces
sudo tail -f /var/log/postfix/mail.log | grep --line-buffered "status=bounced"
```

* * *

### Key Metrics to Track

| Metric | Target | Alert If |
| --- | --- | --- |
| Delivery Rate | \>95% | <90% |
| Bounce Rate | <5% | \>10% |
| Queue Size | 0 | \>100 |
| Service Uptime | 99.9% | Any downtime |

**Quick metric checks:**

```bash
# Today's delivery rate
SENT=$(grep "$(date +%b\ %d)" /var/log/postfix/postfix.log | grep -c "status=sent")
DELIVERED=$(grep "$(date +%b\ %d)" /var/log/postfix/mail.log | grep -c "status=delivered")
echo "Delivery rate: $((DELIVERED * 100 / SENT))%"

# Average delivery time (milliseconds)
grep "status=delivered" /var/log/postfix/mail.log | \
  grep -oP 'delay=\K\d+' | \
  awk '{sum+=$1; n++} END {print "Avg delay: " sum/n "ms"}'

# Top recipient domains
grep "status=delivered" /var/log/postfix/mail.log | \
  grep -oP 'to=<[^@]+@\K[^>]+' | \
  sort | uniq -c | sort -rn | head -5
```

* * *

## Common Operations

### Adding New Senders

```bash
# 1. Edit whitelist
sudo vim /etc/postfix/allowed_senders

# Add line:
# newsender@yourdomain.com    OK

# 2. Rebuild database
sudo postmap /etc/postfix/allowed_senders

# 3. Reload (no restart needed!)
sudo systemctl reload postfix

# 4. Test
echo "Test" | mail -s "Test" -r newsender@yourdomain.com test@example.com
```

**No downtime!** Reload picks up changes instantly.

* * *

### Removing Senders

```bash
# 1. Comment out or remove from whitelist
sudo vim /etc/postfix/allowed_senders
# #oldsender@yourdomain.com    OK

# 2. Rebuild and reload
sudo postmap /etc/postfix/allowed_senders
sudo systemctl reload postfix

# 3. Verify rejection
echo "Test" | mail -s "Test" -r oldsender@yourdomain.com test@example.com
# Should see: "Sender address rejected"
```

* * *

### Managing Mail Queue

**View queue:**

```bash
mailq
```

**Flush queue** (retry all deferred emails):

```bash
sudo postqueue -f
```

**Delete specific email:**

```bash
# Get queue ID from mailq
sudo postsuper -d QUEUE_ID
```

**Delete all queued emails:**

```bash
sudo postsuper -d ALL
```

**Delete only deferred emails:**

```bash
sudo postsuper -d ALL deferred
```

* * *

### Searching Email History

**Find specific email:**

```bash
grep "user@example.com" /var/log/postfix/*.log
```

**Find by sender:**

```bash
grep "from=<sender@yourdomain.com>" /var/log/postfix/postfix.log
```

**Find bounces to specific domain:**

```bash
grep "gmail.com" /var/log/postfix/mail.log | grep "bounced"
```

**Get complete email journey:**

```bash
# Get message ID from sent log
MSG_ID=$(grep "user@example.com" /var/log/postfix/postfix.log | grep -oP 'status=sent \(250 Ok \K[^)]+' | head -1)

# Find all events for that message
grep "$MSG_ID" /var/log/postfix/*.log
```

* * *

## Troubleshooting Guide

### Problem 1: Postfix Won't Start

**Symptoms:**

```bash
sudo systemctl start postfix
# Job for postfix.service failed
```

**Fix:**

```bash
# 1. Check config syntax
sudo postfix check

# 2. View detailed error
sudo journalctl -u postfix -n 20 --no-pager

# 3. Common issues:

# Port in use?
sudo lsof -i :25
# Kill conflicting process: sudo systemctl stop sendmail

# Permission issue?
sudo chown -R postfix:postfix /var/log/postfix
sudo chown -R postfix:postfix /var/spool/postfix

# Check line number from 'postfix check' output
sudo vim /etc/postfix/main.cf +LINE_NUMBER
```

* * *

### Problem 2: Emails Stuck in Queue

**Diagnosis:**

```bash
mailq  # Shows queued emails
sudo tail -100 /var/log/postfix/postfix.log | grep "status=deferred"
```

**Common causes and fixes:**

**Wrong SES credentials:**

```bash
# Verify credentials
sudo postmap -q "[email-smtp.ap-south-1.amazonaws.com]:587" /etc/postfix/sasl_passwd

# Update if needed
sudo vim /etc/postfix/sasl_passwd
sudo postmap /etc/postfix/sasl_passwd
sudo systemctl restart postfix
```

**Network blocked:**

```bash
# Test SES connectivity
telnet email-smtp.ap-south-1.amazonaws.com 587

# Check security group allows outbound 587
# Check route table has internet gateway
```

**SES quota exceeded:**

```bash
aws ses get-send-quota --region ap-south-1
# If near limit, wait or request increase
```

**After fixing, flush the queue:**

```bash
sudo postqueue -f
```

* * *

### Problem 3: Logger Service Keeps Crashing

**Check logs:**

```bash
sudo journalctl -u ses-logger -n 50 --no-pager
sudo tail -50 /var/log/ses-logger-error.log
```

**Common fixes:**

**boto3 missing:**

```bash
python3 -c "import boto3" || sudo yum install -y python3-boto3
sudo systemctl restart ses-logger
```

**Wrong queue URL:**

```bash
# Get correct URL
QUEUE_URL=$(aws sqs get-queue-url --queue-name ses-events-queue --region ap-south-1 --query 'QueueUrl' --output text)

# Update service
sudo sed -i "s|Environment=\"SQS_QUEUE_URL=.*\"|Environment=\"SQS_QUEUE_URL=$QUEUE_URL\"|" /etc/systemd/system/ses-logger.service

sudo systemctl daemon-reload
sudo systemctl restart ses-logger
```

**IAM permissions:**

```bash
# Verify role attached
aws sts get-caller-identity

# Should show: PostfixSESLogger role
# If not, reattach IAM instance profile
```

* * *

### Problem 4: No Delivery Events in Logs

**Diagnosis:**

```bash
# 1. Check SQS queue has messages
aws sqs get-queue-attributes \
  --queue-url "$(aws sqs get-queue-url --queue-name ses-events-queue --region ap-south-1 --query 'QueueUrl' --output text)" \
  --attribute-names ApproximateNumberOfMessages \
  --region ap-south-1
```

**If messages are accumulating:**

*   Logger not processing → Check `sudo journalctl -u ses-logger`
    
*   Restart logger → `sudo systemctl restart ses-logger`
    

**If no messages in queue:**

```bash
# 2. Verify SES publishing to SNS
aws ses get-identity-notification-attributes \
  --identities yourdomain.com \
  --region ap-south-1

# Should show all three topics configured

# 3. Reconfigure if needed
SNS_ARN=$(aws sns list-topics --region ap-south-1 --query "Topics[?contains(TopicArn, 'ses-events-topic')].TopicArn | [0]" --output text)

for EVENT in Delivery Bounce Complaint; do
  aws ses set-identity-notification-topic \
    --identity yourdomain.com \
    --notification-type $EVENT \
    --sns-topic "$SNS_ARN" \
    --region ap-south-1
done
```

* * *

### Problem 5: High Bounce Rate (>10%)

**Analyze bounce reasons:**

```bash
grep "status=bounced" /var/log/postfix/mail.log | \
  grep -oP 'reason=\(\K[^\)]+' | \
  sort | uniq -c | sort -rn | head -10
```

**Common reasons:**

**"User unknown" (invalid addresses):**

```bash
# Extract bounced addresses
grep "status=bounced" /var/log/postfix/mail.log | \
  grep "bounce_type=Permanent" | \
  grep -oP 'to=<\K[^>]+' | \
  sort -u > bounced_addresses.txt

# Remove from your mailing list
```

**"Mailbox full":**

*   Temporary issue, will resolve
    
*   Retry after 24 hours
    

**"550 Spam":**

*   Review email content
    
*   Check SPF/DKIM/DMARC setup
    
*   Verify sender reputation
    

* * *

### Problem 6: Emails Going to Spam

**Verification checklist:**

```bash
# 1. Check SPF
dig +short TXT yourdomain.com | grep spf
# Should include: include:amazonses.com

# 2. Check DKIM
aws ses get-identity-dkim-attributes \
  --identities yourdomain.com \
  --region ap-south-1
# Should show: DkimEnabled=true, Status=Success

# 3. Check DMARC
dig +short TXT _dmarc.yourdomain.com
# Should return DMARC policy

# 4. Check SES reputation
aws ses get-account-sending-enabled --region ap-south-1
# Should be enabled
```

**Content checklist:**

*   Avoid spam trigger words (FREE!, ACT NOW!)
    
*   Include unsubscribe link
    
*   Balance text/image ratio (60% text minimum)
    
*   Use a consistent "From" name and address
    
*   Authenticate with SPF/DKIM/DMARC
    

* * *

## Performance Optimization

### Postfix Tuning

**For higher throughput:**

```bash
sudo vim /etc/postfix/main.cf
```

Add/update:

```ini
# Increase concurrent deliveries
default_destination_concurrency_limit = 50
default_destination_recipient_limit = 50

# Reduce queue lifetime
maximal_queue_lifetime = 1d
bounce_queue_lifetime = 1d

# Connection caching
smtp_connection_cache_on_demand = yes
smtp_connection_cache_destinations = email-smtp.ap-south-1.amazonaws.com
```

Reload:

```bash
sudo systemctl reload postfix
```

* * *

### Logger Optimization

**For high volume (>1000 events/min):**

Edit `/usr/local/bin/ses_logger.py`:

```python
# Increase batch size
response = sqs.receive_message(
    QueueUrl=queue_url,
    MaxNumberOfMessages=30,  # Up from 10
    WaitTimeSeconds=20
)
```

Restart:

```bash
sudo systemctl restart ses-logger
```

* * *

## Scaling Strategies

### When to Scale

| Metric | Scale Trigger |
| --- | --- |
| CPU Usage | Sustained >70% |
| Emails/day | \>40,000 (80% of quota) |
| Queue Size | Sustained >100 |
| Memory | \>80% used |

* * *

### Vertical Scaling (Bigger Instance)

**Current performance by instance:**

| Instance | vCPU | RAM | Emails/day |
| --- | --- | --- | --- |
| t3a.small | 2 | 2GB | 10,000 |
| t3a.medium | 2 | 4GB | 50,000 |
| t3a.large | 2 | 8GB | 100,000 |
| c6a.xlarge | 4 | 8GB | 500,000 |

* * *

## Security Hardening

### Restrict Relay Access

**Tighten network access:**

```bash
sudo vim /etc/postfix/main.cf
```

```ini
# Only specific IPs
mynetworks = 127.0.0.1, 10.10.3.125

# Or specific subnet
mynetworks = 10.10.0.0/21
```

* * *

### Rate Limiting

**Prevent abuse:**

```bash
sudo vim /etc/postfix/main.cf
```

```ini
# Max 100 connections/min per client
smtpd_client_connection_rate_limit = 100

# Max 100 emails/min per client
smtpd_client_message_rate_limit = 100
```

* * *

### Monitor IAM Usage

**Enable CloudTrail for audit:**

```bash
aws cloudtrail create-trail \
  --name email-infrastructure-audit \
  --s3-bucket-name my-audit-logs
```

* * *

## Resources

**AWS Documentation:**

*   [SES Developer Guide](https://docs.aws.amazon.com/ses/)
    
*   [CloudWatch Logs](https://docs.aws.amazon.com/cloudwatch/)
    

**Postfix:**

*   [Official Docs](http://www.postfix.org/documentation.html)
    
*   [Troubleshooting Guide](http://www.postfix.org/DEBUG_README.html)
    

* * *

**Series Complete! 🎉**

*   [Part 1: Architecture & Design](https://tech.cyrilsebastian.com/building-production-email-infrastructure-with-postfix-aws-ses-architecture-design)
    
*   [Part 2: Implementation Guide](https://tech.cyrilsebastian.com/from-zero-to-production-building-postfix-aws-ses-in-2-hours)
    
*   **Part 3: Operations** ← . You just finished this
    

🔗 **If this helped or resonated with you, connect with me on** [**LinkedIn**](https://www.linkedin.com/in/sebastiancyril/). Let’s learn and grow together.

👉 Stay tuned for more behind-the-scenes write-ups and system design breakdowns.

* * *
