Assume you’ve got S3 bucket and Redshift cluster running in EU (Frankfurt).
Set up IAM role, e.g. s3acces with the following policies
Make sure you:
- Enabled IAM role for your Redshift cluster
- Activated Security Token Service Regions
Otherwise you’ll get[2019-09-18 17:38:35] [42804][500310] [Amazon](500310) Invalid operation: User arn:aws:redshift:eu-central-1:${account_number}:${user}:${db}/${db_user} is not authorized to assume IAM Role arn:aws:iam::${account}:role/s3acces;
More detail can be found at http://docs.aws.amazon.com/IAM/latest/UserGuide/id_credentials_temp_enable-regions.html
Navigate to Athena query editor https://eu-central-1.console.aws.amazon.com/athena/home?region=eu-central-1#query and create a database that points to your S3 bucket:
CREATE DATABASE IF NOT EXISTS athenadb LOCATION 's3://${your_backet_name}';
After that you can create an external schema and tables:
create external schema if not exists athena_test from database 'athenadb' iam_role 'arn:aws:iam::${role_id}:role/s3acces' ;
create external table athena_test.test( venueid integer, venuename varchar(200), venuecity varchar(200), venuestate varchar(200), venueseats integer ) row format serde 'org.apache.hadoop.hive.serde2.RegexSerDe' with serdeproperties ( 'serialization.format' = ',', 'field.delim' = ',', 'input.regex' = 'ff' ) stored as textfile location 's3://${your_bucket}/data/' table properties ('has_encrypted_data'='false') ;