AWS Certified Cloud Practitioner

Post 22 of 25

88%

Complete

Cloud Architecture6 min read

AWS Cloud Practitioner #22: CloudWatch - Monitoring y Logs

Domina CloudWatch: metrics, alarms, logs, dashboards y cómo monitorear recursos AWS efectivamente.

🎯 Lo que Aprenderás Hoy

  • Explicar CloudWatch metrics y alarms
  • Configurar logging con CloudWatch Logs
  • Crear dashboards para visualización
  • Set up alertas efectivas
  • Diferenciar CloudWatch vs. CloudTrail

Amazon CloudWatch

¿Qué es? Servicio de monitoring para AWS resources y applications.

plaintext
CloudWatch collect y track:
✅ Metrics (CPU, memory, network)
✅ Logs (application, system)
✅ Events (resource changes)
✅ Alarms (thresholds)
 
Use cases:
- Monitor EC2 CPU utilization
- Track application errors
- Alert si database connections spike
- Dashboard con health de system

CloudWatch Metrics

¿Qué son? Time-series data points (measurements).

Default Metrics (Free)

plaintext
EC2:
- CPUUtilization
- NetworkIn/NetworkOut
- DiskReadOps/DiskWriteOps
- StatusCheckFailed
 
RDS:
- CPUUtilization
- DatabaseConnections
- FreeStorageSpace
- ReadLatency/WriteLatency
 
ELB:
- RequestCount
- TargetResponseTime
- HealthyHostCount
 
S3:
- BucketSizeBytes
- NumberOfObjects
 
Lambda:
- Invocations
- Duration
- Errors
- Throttles
 
Default frequency: 5 minutes

Custom Metrics

plaintext
Application-specific metrics:
 
Examples:
- Active users count
- Orders per minute
- Payment processing time
- API response times
 
Push to CloudWatch:
aws cloudwatch put-metric-data \
  --namespace "MyApp" \
  --metric-name "ActiveUsers" \
  --value 150 \
  --timestamp 2025-11-16T10:00:00Z
 
Frequency: Up to 1-second resolution
Cost: $0.30 per metric/month

CloudWatch Alarms

¿Qué son? Automated actions basadas en metric thresholds.

plaintext
Alarm states:
- OK: Within threshold
- ALARM: Threshold exceeded
- INSUFFICIENT_DATA: Not enough data
 
Actions when ALARM:
- SNS notification (email/SMS)
- Auto Scaling action (add instances)
- EC2 action (stop, terminate, reboot)
- Systems Manager action

Creating Alarm

bash
# CPU > 80% alarm
aws cloudwatch put-metric-alarm \
  --alarm-name cpu-mon \
  --alarm-description "CPU exceeds 80%" \
  --metric-name CPUUtilization \
  --namespace AWS/EC2 \
  --statistic Average \
  --period 300 \
  --evaluation-periods 2 \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:ops-team
 
# Evaluation:
Period: 5 minutes
Evaluation periods: 2
Trigger: If CPU > 80% for 2 consecutive 5-min periods (10 min total)
Action: Send SNS notification

Common Alarms

plaintext
1. High CPU:
   EC2 CPUUtilization > 80%
   Action: Scale out (add instances)
 
2. Low Healthy Hosts:
   ELB HealthyHostCount < 2
   Action: Alert ops team
 
3. High Error Rate:
   Lambda Errors > 10 per minute
   Action: Page on-call engineer
 
4. Billing:
   EstimatedCharges > $1000
   Action: Email finance team
 
5. RDS Storage:
   FreeStorageSpace < 10 GB
   Action: Increase storage / alert DBA

CloudWatch Logs

¿Qué es? Centralized log management.

plaintext
Log sources:
- EC2 instances (application logs)
- Lambda functions (console.log)
- RDS (error logs, slow query logs)
- CloudTrail (API logs)
- VPC Flow Logs (network traffic)
- Route 53 (DNS queries)
 
Benefits:
✅ Centralized (all logs en un lugar)
✅ Searchable
✅ Retention policies
✅ Metric filters (log → metrics → alarms)

Log Groups y Streams

plaintext
Hierarchy:
 
Log Group: /aws/lambda/my-function
  ├── Log Stream: 2025/11/16/[$LATEST]abc123
  │   └── Events:
  │       - "START RequestId: xyz"
  │       - "Processing order #1234"
  │       - "END RequestId: xyz"

  └── Log Stream: 2025/11/16/[$LATEST]def456
      └── Events: ...
 
Log Group: Collection of related streams
Log Stream: Sequence of log events from source
Log Event: Single log entry

Sending Logs

python
# Lambda automatically sends logs
import logging
logger = logging.getLogger()
 
def lambda_handler(event, context):
    logger.info(f"Processing order {event['orderId']}")
    # Automatically to CloudWatch Logs
 
# EC2: Install CloudWatch Agent
sudo yum install -y amazon-cloudwatch-agent
 
# Configure:
{
  "logs": {
    "logs_collected": {
      "files": {
        "collect_list": [{
          "file_path": "/var/log/app.log",
          "log_group_name": "/aws/ec2/my-app",
          "log_stream_name": "{instance_id}"
        }]
      }
    }
  }
}

Metric Filters

plaintext
Create metric from logs:
 
Example: Count ERROR logs
 
Log entry:
"[ERROR] Database connection failed"
 
Metric filter:
Pattern: [ERROR]
Metric: AppErrors
Namespace: MyApp
 
Result:
CloudWatch metric AppErrors increments cada ERROR
→ Create alarm: AppErrors > 10/min
→ Send notification

CloudWatch Dashboards

¿Qué son? Visual interfaces para metrics.

plaintext
Dashboard components:
- Line graphs (CPU over time)
- Number widgets (current value)
- Stacked area (multiple metrics)
- Logs widget (query results)
 
Example dashboard:
┌─────────────────────────────────┐
│ EC2 CPU Utilization (24h)       │
│ [Line graph]                    │
└─────────────────────────────────┘
 
┌─────────────────────────────────┐
│ Active Users: 1,234             │
└─────────────────────────────────┘
 
┌─────────────────────────────────┐
│ Error Rate (last hour)          │
│ [Stacked area]                  │
└─────────────────────────────────┘
 
Use cases:
- NOC (Network Operations Center)
- Daily standup meetings
- Real-time monitoring

CloudWatch vs. CloudTrail

AspectoCloudWatchCloudTrail
PurposeMonitoringAuditing
DataPerformance metricsAPI calls
QuestionHow is it performing?Who did what?
Use caseCPU high, errorsUser deleted DB
AlertsPerformance thresholdsSecurity events
plaintext
Example distinction:
 
CloudWatch:
"EC2 instance CPU at 95%"
"Lambda errors spiking"
"RDS connections increasing"
 
CloudTrail:
"User juan@empresa.com terminated instance i-123"
"IAM policy modified at 10:30 AM"
"S3 bucket made public"
 
Often used together:
CloudTrail logs → CloudWatch Logs → Metric filter → Alarm
Example: Alert if root account used

Pricing

plaintext
Metrics:
- Default metrics: FREE
- Custom metrics: $0.30/metric/month
- Detailed monitoring (1-min): $2.10/instance/month
 
Alarms:
- Standard: $0.10/alarm/month
- High-resolution: $0.30/alarm/month
 
Logs:
- Ingestion: $0.50/GB
- Storage: $0.03/GB/month
- Data scan (Insights): $0.005/GB
 
Dashboards:
- 3 dashboards FREE
- $3/dashboard/month después
 
Free Tier:
- 10 custom metrics
- 10 alarms
- 5 GB logs ingestion
- 5 GB logs storage

Best Practices

plaintext
1. Set meaningful alarms:
   ✅ Critical: Page on-call
   ✅ Warning: Email team
   ❌ Don't over-alert (alarm fatigue)
 
2. Use dashboards:
   Create for each team/service
   Share URL para visibility
 
3. Log retention:
   Balance cost vs. compliance
   30 days for most, 90+ for compliance
 
4. Metric filters:
   Convert logs to metrics
   Track business KPIs
 
5. Use namespaces:
   Organize custom metrics
   MyApp/Production, MyApp/Dev
 
6. Tag resources:
   Easy filtering en dashboards
   Cost allocation
 
7. Use CloudWatch Insights:
   Query logs at scale
   Faster than grep

📝 Preparación para el Examen

Puntos Clave

CloudWatch:

  • 📌 Monitoring: Metrics, logs, alarms
  • 📌 Default metrics: CPU, network, disk (5-min)
  • 📌 Custom metrics: Application-specific
  • 📌 Alarms: Automated actions on thresholds

Components:

  • 📌 Metrics: Time-series data
  • 📌 Logs: Centralized logging
  • 📌 Alarms: Threshold-based actions
  • 📌 Dashboards: Visualization

vs. CloudTrail:

  • 📌 CloudWatch: Performance (HOW)
  • 📌 CloudTrail: Auditing (WHO/WHAT)

Preguntas de Práctica

Pregunta 1:

¿Cuál es la frecuencia default de CloudWatch metrics para EC2?

A) 1 minute B) 5 minutes C) 15 minutes D) 1 hour

Success

Respuesta: B) 5 minutes

Default CloudWatch metrics para EC2 son cada 5 minutos (gratis). Detailed monitoring (1-min) cuesta extra.

Pregunta 2:

¿Qué servicio monitorea performance de recursos?

A) CloudTrail B) CloudWatch C) Config D) Inspector

Success

Respuesta: B) CloudWatch

CloudWatch monitorea performance (CPU, memory, errors). CloudTrail es para auditing (API calls).


🎓 Resumen

  1. CloudWatch: Monitoring service (metrics, logs, alarms)
  2. Metrics: Performance measurements (CPU, network)
  3. Alarms: Automated actions on thresholds
  4. Logs: Centralized logging
  5. Dashboards: Visualization
  6. vs. CloudTrail: Performance vs. Auditing

⏭️ Próximo Post

Post #23: AWS Organizations - Multi-Account Management


Tags: #AWS #CloudPractitioner #CloudWatch #Monitoring #Logs #Alarms #Metrics #Certification

Written by Jhonny Lorenzo

Researcher at TrautsLab

Related Articles

Recent Articles

Comments