Medium1 markMultiple Choice
Domain 1.5: Cost OptimizationAthenaPerformanceCost OptimizationS3

AWS SAP-C02 · Question 15 · Domain 1.5: Cost Optimization

A company has a massive data lake in Amazon S3. They use Amazon Athena for querying. Over time, query performance has degraded, and costs have skyrocketed. The data is currently stored in raw CSV format, partitioned by year. What is the MOST effective strategy to improve performance and reduce costs?

Answer options:

A.

Migrate the data to Amazon Redshift and use Redshift Spectrum to query the CSV files.

B.

Convert the CSV files to Apache Parquet format using AWS Glue. Update the partition strategy to year, month, and day.

C.

Enable S3 Intelligent-Tiering for the data lake bucket to reduce storage costs.

D.

Provision an Amazon EMR cluster to run queries instead of Athena.

How to approach this question

Identify the best practices for Athena: columnar formats (Parquet/ORC) and granular partitioning.

Full Answer

B.Convert the CSV files to Apache Parquet format using AWS Glue. Update the partition strategy to year, month, and day.✓ Correct
Amazon Athena charges per terabyte of data scanned. Converting data to a columnar format like Parquet allows Athena to read only the columns required by the query. Finer partitioning (year/month/day) allows Athena to skip irrelevant files entirely.

Common mistakes

Focusing only on storage costs (S3 Intelligent-Tiering) instead of query costs.

Practice the full AWS Solutions Architect Professional SAP-C02 Practice Exam 4

75 questions · hints · full answers · grading

More questions from this exam