This guide will help you integrate Amazon Redshift with Twing Data to optimize your data warehouse performance, reduce costs, and gain valuable insights into query patterns and resource utilization.
Twing Data analyzes your Redshift metadata to provide actionable insights. We collect information from several key system tables:
Twing Data processes your Redshift metadata to provide:
The live integration creates a dedicated user with minimal permissions to access only the necessary system tables. This ensures secure, continuous monitoring while maintaining strict access controls.
-- Create user with minimal permissions CREATE USER twing_user WITH PASSWORD 'RANDOM_PASSWORD_HERE'; -- Grant read-only access to the necessary system tables ALTER USER twing_user WITH SYSLOG ACCESS UNRESTRICTED;
Note: The created user has read-only access to system tables and cannot access your actual data tables.
This is a one-time pull of multiple Redshift system tables that can be loaded manually for a comprehensive analysis of query performance and resource utilization.
-- The query history select s.user_id, u.usename as usename, s.query_id, s.query_type, s.database_name, trim(s.status) as status, s.result_cache_hit, s.start_time, s.end_time, trim(s.user_query_hash) as user_query_hash, trim(s.generic_query_hash) as generic_query_hash, s.query_hash_version, s.service_class_id, s.service_class_name, s.error_message, s.returned_rows, s.returned_bytes, q.concurrency_scaling_status from SYS_QUERY_HISTORY s left join SVL_USER_INFO u on u.usesysid = s.user_id left join (select xid, userid, pid, cast(median(concurrency_scaling_status) as int) as concurrency_scaling_status from STL_QUERY --where starttime > 'yyyy-mm-dd' -- Specify a query start time if desired group by 1, 2, 3) q on s.user_id = q.userid and s.transaction_id = q.xid and s.session_id = q.pid where 1=1 and s.query_type != 'UTILITY' and trim(s.query_text) not in ('SELECT pg_backend_pid()', 'select version()', 'select current_schema()', 'select current_database()', 'select current_user()') and trim(s.status) not in ('RUNNING', 'QUEUED', 'RETURNING') --and start_time > 'yyyy-mm-dd' -- Filter on the same time here if included above (note the different column name) order by s.start_time asc limit 5000; -- Specify limit to match desired rows and file size -- The query details since Redshift doesn't have the full query text in SYS_QUERY_HISTORY select query_id, sequence, text from SYS_QUERY_TEXT where query_id in ( select s.query_id from SYS_QUERY_HISTORY s where s.query_type != 'UTILITY' and trim(s.query_text) not in ('SELECT pg_backend_pid()', 'select version()', 'select current_schema()', 'select current_database()', 'select current_user()') limit 5000 ) order by query_id, sequence; -- The WLM Query Metadata select s.query_id as sys_query_id, s.user_id, w.service_class as service_class_id, w.service_class_name, q.query as stl_query_id, q.concurrency_scaling_status, q.aborted, W.exec_start_time, w.exec_end_time, w.queue_start_time, w.queue_end_time, w.total_queue_time, w.total_exec_time, trim(w.query_priority) as query_priority, w.task, w.slot_count from SYS_QUERY_HISTORY s left join (select xid, userid, pid, query, aborted, cast(median(concurrency_scaling_status) as int) as concurrency_scaling_status from STL_QUERY where 1=1 --where starttime > 'yyyy-mm-dd' -- Specify a query start time if desired group by 1, 2, 3, 4, 5) q on s.user_id = q.userid and s.transaction_id = q.xid and s.session_id = q.pid left join STL_WLM_QUERY w on q.query = w.query and s.user_id = w.userid and s.transaction_id = w.xid and s.service_class_id = w.service_class where 1=1 and s.query_type != 'UTILITY' and trim(s.query_text) not in ('SELECT pg_backend_pid()', 'select version()', 'select current_schema()', 'select current_database()', 'select current_user()') and trim(s.status) not in ('RUNNING', 'QUEUED', 'RETURNING') --and start_time > 'yyyy-mm-dd' -- Filter on the same time here if included above (note the different column name) and ( (w.queue_start_time is not null) or (w.queue_end_time is not null) or (w.total_queue_time is not null) or (w.total_exec_time is not null) or (w.query_priority is not null) ) order by s.start_time asc limit 20000; -- Using an approximate ratio of 4 WLM queries to 1 user query (query history), scale your LIMIT accordingly -- Concurrency Scaling Information select start_time, end_time, usage_in_seconds, queries as num_queries from svcs_concurrency_scaling_usage -- where start_time > 'yyyy-mm-dd' -- Consistent with other chosen start times -- limit 1000 -- Less data here, so this may not be necessary -- The Node Metadata select node, COUNT(*) as slice_count from stv_slices group by node;
The extracted data provides insights into query performance, resource utilization, and optimization opportunities while excluding system queries and maintaining data privacy.