This guide will help you integrate Amazon Redshift with Twing Data to optimize your data warehouse performance, reduce costs, and gain valuable insights into query patterns and resource utilization.
Twing Data analyzes your Redshift metadata to provide actionable insights. We collect information from several key system tables:
Twing Data processes your Redshift metadata to provide:
The live integration creates a dedicated user with minimal permissions to access only the necessary system tables. This ensures secure, continuous monitoring while maintaining strict access controls.
-- Create user with minimal permissions CREATE USER twing_user WITH PASSWORD 'RANDOM_PASSWORD_HERE'; -- Grant read-only access to the necessary system tables ALTER USER twing_user WITH SYSLOG ACCESS UNRESTRICTED;
Note: The created user has read-only access to system tables and cannot access your actual data tables.
This is a one-time pull of multiple Redshift system tables that can be loaded manually for a comprehensive analysis of query performance and resource utilization.
-- The query history
select
s.user_id,
u.usename as usename,
s.query_id,
s.query_type,
s.database_name,
trim(s.status) as status,
s.result_cache_hit,
s.start_time,
s.end_time,
trim(s.user_query_hash) as user_query_hash,
trim(s.generic_query_hash) as generic_query_hash,
s.query_hash_version,
s.service_class_id,
s.service_class_name,
s.error_message,
s.returned_rows,
s.returned_bytes,
q.concurrency_scaling_status
from SYS_QUERY_HISTORY s
left join SVL_USER_INFO u on u.usesysid = s.user_id
left join (select xid, userid, pid, cast(median(concurrency_scaling_status) as int) as concurrency_scaling_status
from STL_QUERY
--where starttime > 'yyyy-mm-dd' -- Specify a query start time if desired
group by 1, 2, 3) q
on s.user_id = q.userid
and s.transaction_id = q.xid
and s.session_id = q.pid
where 1=1
and s.query_type != 'UTILITY'
and trim(s.query_text) not in ('SELECT pg_backend_pid()', 'select version()', 'select current_schema()',
'select current_database()', 'select current_user()')
and trim(s.status) not in ('RUNNING', 'QUEUED', 'RETURNING')
--and start_time > 'yyyy-mm-dd' -- Filter on the same time here if included above (note the different column name)
order by s.start_time asc
limit 5000; -- Specify limit to match desired rows and file size
-- The query details since Redshift doesn't have the full query text in SYS_QUERY_HISTORY
select query_id, sequence, text
from SYS_QUERY_TEXT
where query_id in (
select s.query_id
from SYS_QUERY_HISTORY s
where s.query_type != 'UTILITY'
and trim(s.query_text) not in ('SELECT pg_backend_pid()', 'select version()', 'select current_schema()',
'select current_database()', 'select current_user()')
limit 5000
)
order by query_id, sequence;
-- The WLM Query Metadata
select
s.query_id as sys_query_id,
s.user_id,
w.service_class as service_class_id,
w.service_class_name,
q.query as stl_query_id,
q.concurrency_scaling_status,
q.aborted,
W.exec_start_time,
w.exec_end_time,
w.queue_start_time,
w.queue_end_time,
w.total_queue_time,
w.total_exec_time,
trim(w.query_priority) as query_priority,
w.task,
w.slot_count
from SYS_QUERY_HISTORY s
left join
(select xid, userid, pid, query, aborted, cast(median(concurrency_scaling_status) as int) as concurrency_scaling_status
from STL_QUERY
where 1=1
--where starttime > 'yyyy-mm-dd' -- Specify a query start time if desired
group by 1, 2, 3, 4, 5) q
on s.user_id = q.userid
and s.transaction_id = q.xid
and s.session_id = q.pid
left join STL_WLM_QUERY w
on q.query = w.query
and s.user_id = w.userid
and s.transaction_id = w.xid
and s.service_class_id = w.service_class
where 1=1
and s.query_type != 'UTILITY'
and trim(s.query_text) not in ('SELECT pg_backend_pid()', 'select version()', 'select current_schema()',
'select current_database()', 'select current_user()')
and trim(s.status) not in ('RUNNING', 'QUEUED', 'RETURNING')
--and start_time > 'yyyy-mm-dd' -- Filter on the same time here if included above (note the different column name)
and (
(w.queue_start_time is not null)
or (w.queue_end_time is not null)
or (w.total_queue_time is not null)
or (w.total_exec_time is not null)
or (w.query_priority is not null)
)
order by s.start_time asc
limit 20000; -- Using an approximate ratio of 4 WLM queries to 1 user query (query history), scale your LIMIT accordingly
-- Concurrency Scaling Information
select start_time, end_time, usage_in_seconds, queries as num_queries
from svcs_concurrency_scaling_usage
-- where start_time > 'yyyy-mm-dd' -- Consistent with other chosen start times
-- limit 1000 -- Less data here, so this may not be necessary
-- The Node Metadata
select node, COUNT(*) as slice_count
from stv_slices
group by node;The extracted data provides insights into query performance, resource utilization, and optimization opportunities while excluding system queries and maintaining data privacy.