RDS Enhanced Monitoring

Published in

DevOps.dev

10 min readMay 31, 2023

In this blog, we will see, how we can enable RDS-enhanced monitoring and visualize the metrics on Cloudwatch. Also, I will guide you on how to export the RDS cloud watch metrics to the Grafana dashboard.

RDS (Relational Database Service) Enhanced Monitoring is a feature offered by Amazon Web Services (AWS) that provides a deeper level of insight into the performance of your Amazon RDS database instances. Amazon RDS is a fully managed database service that makes it easy to set up, operate, and scale relational databases in the cloud.

Enhanced Monitoring goes beyond the basic monitoring capabilities of RDS by collecting and displaying additional operating system (OS) and resource-level metrics. These metrics include CPU utilization, memory usage, disk I/O, and network I/O. The data is captured at a higher granularity, allowing for more detailed analysis and troubleshooting of performance-related issues.

By enabling enhanced monitoring, you gain better visibility into the behaviour and resource utilization of your RDS instances. This helps you identify performance bottlenecks, optimize database configurations, and ensure efficient utilization of resources. The collected metrics are made available in Amazon CloudWatch, a monitoring and management service provided by AWS, where you can visualize and analyze the data.

Follow the below-given steps to enable enhanced monitoring.

Prerequisite: -

1. RDS DB is up and running.
2. Grafana is installed and accessible.

IAM ROLE for Enhanced Monitoring

First, we need to create an IAM role to monitor the RDS database. To create an IAM role for Amazon RDS enhanced monitoring

Open the IAM console at https://console.aws.amazon.com.
In the navigation pane, select Roles.
Click on Create role.
Choose the AWS service tab, and then choose RDS from the list of services.
Choose RDS — Enhanced Monitoring, and then choose Next.
Ensure that the Permissions policies show AmazonRDSEnhancedMonitoringRole, and then choose Next.
For role name, enter a name for your role. For example, enter demo.
The trusted entity for your role is the AWS service monitoring.rds.amazonaws.com.
Click on Create role.

Turning on the Enhanced Monitoring

You can turn Enhanced Monitoring on and off using the AWS Management Console, AWS CLI, or RDS API. You choose the RDS DB instances on which you want to turn on enhanced monitoring. In this tutorial, we are using the AWS console to enable enhanced monitoring.

Select your RDS instance, click on Modify, and scroll to Additional Configuration.
In Monitoring, choose Enable Enhanced Monitoring for your DB instance or read replica.
Set the Monitoring Role property to the IAM role that you created in the previous step to permit Amazon RDS to communicate with Amazon CloudWatch Logs for you, or choose Default to have RDS create a role for you named rds-monitoring-role.
Set the Granularity property to the interval, in seconds, between points when metrics are collected for your DB instance or read replica. The Granularity property can be set to one of the following values: 1, 5, 10, 15, 30, or 60.

The fastest that the RDS console refreshes is every 5 seconds. If you set the granularity to 1 second in the RDS console, you still see updated metrics only every 5 seconds. You can retrieve 1-second metric updates by using CloudWatch Logs.

You can use Cloudwatch to view metrics and implement alerts.

To view Enhanced metrics on Cloudwatch

You can also view the metrics in the monitoring tab in the RDS console. This tab shows several metric graphs for each database.

2. You can use the Cloudwatch console also to view the metrics. For that, Open the CloudWatch console at https://console.aws.amazon.com/cloudwatch/.

3. Under Metrics click on All Metrics.

4. Select the AWS region.

Click on Browse and select RDS, now you can select the option that how you want to see the RDS metrics.

To implement alerts

(You can ignore implementing alerts in Cloudwatch if you want to use Prometheus alert manager.)

Navigate to the Cloudwatch console because the Cloudwatch alarms are created from the Cloudwatch console.

Click on create alarm.

Click on Select Metric and type the name of the metric.

Choose the metric for the database you are going to monitor (which you can find in the tile labelled Per-Database Metrics). Click Select Metric.

Select the threshold for your alarm.

Click on Next and select the notification method for your alarm.

If your SNS topic is already created then you can select that and if you are creating a new topic then click on Create topic. Once done click on Next.

In the next window give your alarm a name.

Click on the Next. Now review your alarm and click on Create alarm.

Now your alarm is created you will receive a notification whenever this alarm is triggered.

If you want to view this data on grafana then please continue reading.

Viewing metrics in Grafana

To view Cloudwatch metrics in grafana we have 2 options: -

Adding Cloudwatch source in Grafana
Adding metrics from Prometheus.

Before moving further, we need to create an IAM role with read access to Cloudwatch so that Grafana and Prometheus can read data from Cloudwatch. Follow this link to create an IAM role

1. Adding Cloudwatch source.

In this approach, we will use Cloudwatch as a data source, and in the setting of the data source, you will get the option to add the connection details. You can select the SDK default, but you have to pass the Role ARN that you created in the previous step. If you want to use an AWS IAM User to access Cloudwatch, then you have to pass the access key and secret key in the grafana, or you can create a credentials file with the access key and secret key and upload it onto the grafana.

Once you have entered your authentication provider details, select the Default Region and click "Save & Test”. Now your Cloudwatch data source is added.

Import the dashboard to access the metrics. You can use the dashboard ID 707, to access the metrics. Your dashboard will look like the one shown below.

2. Adding metrics from Prometheus.

To add metrics from Cloudwatch, we need to create a yace exporter in our EKS cluster. For that, we need an IAM user who has the privilege to read data from Cloudwatch.

Create a credentials file with the access key and secret key of the user. Once done, run the below-mentioned command.

cat ./credentials | base64

Change the path of the credentials file according to your scenario. Now create a deployment file with the following manifest.

apiVersion: v1
kind: Namespace
metadata:
 name: yace
---
apiVersion: apps/v1
kind: Deployment
metadata:
 name: yace-rds
 namespace: yace
spec:
 selector:
   matchLabels:
     app: yace-rds
 replicas: 1
 template:
   metadata:
     labels:
       app: yace-rds
     annotations:
       prometheus.io/scrape: "true"
       prometheus.io/port: "5000"
   spec:
     containers:
     - name: yace
       image: quay.io/invisionag/yet-another-cloudwatch-exporter:v0.21.0-alpha
       ports:
       - containerPort: 5000
       volumeMounts:
         - name: yace-rds-config
           mountPath: /tmp/config.yml
           subPath: config.yml
         - name: yace-rds-credentials
           mountPath: /exporter/.aws/credentials
           subPath: credentials
       resources:
         limits:
           memory: "128Mi"
           cpu: "500m"
     volumes:
       - configMap:
           defaultMode: 420
           name: yace-rds-config
         name: yace-rds-config
       - secret:
           defaultMode: 420
           secretName: yace-rds-credentials
         name: yace-rds-credentials
---
apiVersion: v1
kind: ConfigMap
metadata:
 name: yace-rds-config
 namespace: yace
data:
 config.yml: |
   discovery:
     jobs:
     - regions:
       - us-west-2
       type: rds
       enableMetricData: true
       metrics:
         - name: BinLogDiskUsage
           statistics:
           - Average
           period: 300
           length: 3600
         - name: BurstBalance
           statistics:
           - Average
           period: 300
           length: 3600
         - name: CPUUtilization
           statistics:
           - Average
           period: 300
           length: 3600
         - name: CPUCreditUsage
           statistics:
           - Average
           period: 300
           length: 3600
         - name: CPUCreditBalance
           statistics:
           - Average
           period: 300
           length: 3600
         - name: DatabaseConnections
           statistics:
           - Average
           period: 300
           length: 3600
         - name: DiskQueueDepth
           statistics:
           - Average
           - Maximum
           period: 300
           length: 3600
         - name: FailedSQLServerAgentJobsCount
           statistics:
           - Average
           period: 300
           length: 3600
         - name: FreeableMemory
           statistics:
           - Average
           period: 300
           length: 3600
         - name: FreeStorageSpace
           statistics:
           - Average
           period: 300
           length: 3600
         - name: MaximumUsedTransactionIDs
           statistics:
           - Average
           period: 300
           length: 3600
         - name: NetworkReceiveThroughput
           statistics:
           - Average
           period: 300
           length: 3600
         - name: NetworkTransmitThroughput
           statistics:
           - Average
           period: 300
           length: 3600
         - name: OldestReplicationSlotLag
           statistics:
           - Average
           period: 300
           length: 3600
         - name: ReadIOPS
           statistics:
           - Average
           period: 300
           length: 3600
         - name: ReadLatency
           statistics:
           - Maximum
           - Average
           period: 300
           length: 3600
         - name: ReadThroughput
           statistics:
           - Average
           period: 300
           length: 3600
         - name: ReplicaLag
           statistics:
           - Average
           period: 300
           length: 3600
         - name: ReplicationSlotDiskUsage
           statistics:
           - Average
           period: 300
           length: 3600
         - name: SwapUsage
           statistics:
           - Average
           period: 300
           length: 3600
         - name: TransactionLogsDiskUsage
           statistics:
           - Average
           period: 300
           length: 3600
         - name: TransactionLogsGeneration
           statistics:
           - Average
           period: 300
           length: 3600
         - name: WriteIOPS
           statistics:
           - Average
           period: 300
           length: 3600
         - name: WriteLatency
           statistics:
           - Maximum
           - Average
           period: 300
           length: 3600
         - name: WriteThroughput
           statistics:
           - Average
           period: 300
           length: 3600
---
apiVersion: v1
kind: Secret
metadata:
 name: yace-rds-credentials
 namespace: yace
data:
 # Add in credentials the result of:
 # cat ./credentials | base64
 credentials: |
   XXXX
--
apiVersion: v1
kind: Service
metadata:
  labels:
    app: yace-rds
  name: yace-svc
  namespace: yace
spec:
  ports:
  - port: 5000
    protocol: TCP
    targetPort: 5000
  selector:
    app: yace-rds

In the above manifest replace the value of region with your region in configmap and in secret replace XXX with the value you have received by running the command cat ./credentials | base64 in the previous step.

Before moving further deploy Prometheus on the cluster with the following configuration.

Add the job exporter in your Prometheus.

      - job_name: "yace-exporter"
        static_configs:
          - targets: ["yace-svc.yace.svc:5000"]

It will import the jobs from the yace pod and feed them to Prometheus. For alerting you need to add the below-mentioned alerting rule in Prometheus.

groups:
   - name: AWS-RDS
     rules:
     - alert: LongCPUThrottling
       expr: |
         aws_rds_cpuutilization_average > 95
       for: 10m
       labels:
         severity: page
       annotations:
         summary: CPU over 95% for 10 minutes in instance {{$labels.dimension_DBInstanceIdentifier}}
     - alert: LowFreeMemory
       expr: |
         aws_rds_freeable_memory_average < 128*1024*1024
       for: 10m
       labels:
         severity: page
       annotations:
         summary: Free memory under 128MB in instance {{$labels.dimension_DBInstanceIdentifier}}
     - alert: LowFreeDisk
       expr: |
         aws_rds_free_storage_space_average < 512*1024*1024
       for: 10m
       labels:
         severity: page
       annotations:
         summary: Free disk under 512MB {{$labels.dimension_DBInstanceIdentifier}}
     - alert: DiskFullIn48H
       expr: |
         predict_linear(aws_rds_free_storage_space_average[48h], 48 * 3600) < 0
       for: 10m
       labels:
         severity: warning
       annotations:
         summary: Disk will be full within 48 hours in instance {{$labels.dimension_DBInstanceIdentifier}}
     - alert: DiskFullIn12H
       expr: |
         predict_linear(aws_rds_free_storage_space_average[12h], 12 * 3600) < 0
       for: 10m
       labels:
         severity: page
       annotations:
         summary: Disk will be full within 12 hours in instance {{$labels.dimension_DBInstanceIdentifier}}
     - alert: HighReadLatency
       expr: |
         aws_rds_read_latency_average > 0.250
       for: 10m
       labels:
         severity: warning
       annotations:
         summary: Average read latency over 250ms in instance {{$labels.dimension_DBInstanceIdentifier}}
     - alert: HighWriteLatency
       expr: |
         aws_rds_write_latency_average > 0.250
       for: 10m
       labels:
         severity: warning
       annotations:
         summary: Average write latency over 250ms in instance {{$labels.dimension_DBInstanceIdentifier}}
     - alert: HighDiskQueue
       expr: |
         aws_rds_disk_queue_depth_average > 25
       for: 10m
       labels:
         severity: warning
       annotations:
         summary: High disk queue depth in instance {{$labels.dimension_DBInstanceIdentifier}}
          - alert: ReplicaLag
       expr: |
         aws_rds_oldest_replication_slot_lag_average > 10
       for: 10m
       labels:
        severity: warning
       annotations:
        summary: Average replication slot lag {{$labels.dimension_DBInstanceIdentifier}}
This configuration will enable the alerts on cloudwatch metrics, you can change the values in the configuration according to your need.

Once all the configuration is done you can see the alerts on the Prometheus.

And under the targets, you can see the yace-exporter is up.

Now on Grafana, add Prometheus as a data source and import this dashboard. Use the wget command to download the dashboard.

wget https://raw.githubusercontent.com/sagar-rafay/grafana-files/main/rds-promcat-dashboard.json

Once done import the dashboard on your grafana and explore the metrics. You can also implement alerts from the grafana dashboard.

Summary

In this blog post, we have gone through how you can easily enable the enhanced metrics on RDS and visualise it on Cloudwatch. Also we had explored that how you can export the RDS metrics from cloud watch to self hosted Grafana.

I hope you found this post informative and engaging. I would love to hear your thoughts on this post, so do start a conversation on Twitter or LinkedIn .

Here are some of my other articles that you may find interesting.

Integration of Opentelemetry (Auto Instrumentation) with Jaeger

Earlier when we were using monolithic applications, It was easy to deduct and debug a problem because there was only…

blog.devgenius.io

Vcluster — Architecture Overview and Installation

For container orchestration, Kubernetes has become the de-facto standard. You can deploy your microservices-based…

blog.devgenius.io

Backup, Restore and Migrate Kubernetes Cluster resources using Velero.

Tutorial about backing , migrating and restoring Kubernetes workload to another Kubernetes Cluster.

blog.devgenius.io

Until Next time…..

References: -

https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/CloudWatch-Grafana-support.html

Monitoring metrics in an Amazon RDS instance

Monitoring metrics in an Amazon RDS instance - Amazon Relational Database Service In the following sections, you can…

docs.aws.amazon.com

promcat

AWS RDS app details

promcat.io

How to Monitor Amazon RDS with CloudWatch

Amazon RDS allows you to store your application data in databases without having to actually manage the servers the…

www.bluematador.com

Using AWS CloudWatch in Grafana.

DevOps & Cloud Computing. A-Z Guide.

medium.com

DevOps.dev

RDS Enhanced Monitoring

Prerequisite: -

IAM ROLE for Enhanced Monitoring

Turning on the Enhanced Monitoring

To view Enhanced metrics on Cloudwatch

To implement alerts

Viewing metrics in Grafana

1. Adding Cloudwatch source.

2. Adding metrics from Prometheus.

Summary

Integration of Opentelemetry (Auto Instrumentation) with Jaeger

Earlier when we were using monolithic applications, It was easy to deduct and debug a problem because there was only…

Vcluster — Architecture Overview and Installation

For container orchestration, Kubernetes has become the de-facto standard. You can deploy your microservices-based…

Backup, Restore and Migrate Kubernetes Cluster resources using Velero.

Tutorial about backing , migrating and restoring Kubernetes workload to another Kubernetes Cluster.

References: -

Monitoring metrics in an Amazon RDS instance

Monitoring metrics in an Amazon RDS instance - Amazon Relational Database Service In the following sections, you can…

promcat

AWS RDS app details

How to Monitor Amazon RDS with CloudWatch

Amazon RDS allows you to store your application data in databases without having to actually manage the servers the…

Using AWS CloudWatch in Grafana.

DevOps & Cloud Computing. A-Z Guide.

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Published in DevOps.dev

Written by Sagar Parmar

No responses yet