BigQuery Cost Optimization Tips - Partitioning a Regular BQ Table from a Public Dataset



BigQuery Cost Optimization

BigQuery is Google Cloud’s Enterprise Multicloud Data warehouse designed for business agility. It is a Serverless DWH meaning Google does all resource provisioning behind the scenes, so you can focus on data and analysis rather than worrying about upgrading, securing, or managing the infrastructure.

Since it is Serverless you need to know which SQL Query will be charged and which will be free. We at Evonence very cautiously run SQL Queries in a smart way to reduce cost while performing various ELT and Analytics. Happy to share some of the Cost Optimization Tips with the community.

The game starts now...

Make sure you have a GCP Project with Billing enabled. BigQuery & Cloud Storage APIs’ will be enabled by default.

  1. Create a BigQuery Dataset

Creating Dataset in BigQuery








2. Lets partition bigquery-public-data.stackoverflow.stackoverflow_posts table in a cost optimized way. 

  • Unoptimized way: If we use the below SQL query which consists of a SELECT * which is a full table scan that incurs a cost for processing 29.4GiB of data.

bigquery-public-data.stackoverflow.stackoverflow
Query Job details
Query Job details





  • Optimized way in four steps:  Since Data Loading from Cloud Storage and Data Export (extract) to Cloud Storage is totally free. Will leverage this to have a partitioned of bigquery-public-data.stackoverflow.stackoverflow_posts table

Step #1: Create a Standard Region Cloud Storage bucket using the below command.

BigQuery Tips

Step #2: Run bq extract to export the data to this Cloud Storage bucket in an AVRO compressed format using the below command.

bq extract --compression=SNAPPY \

--destination_format=AVRO \

bigquery-public-data:stackoverflow.stackoverflow_posts \ 

gs://evonence-bq-cost-optimization-tips/avro/*.zip

Extract Job details

Step #3: Run bq load to load the data from this Cloud Storage bucket to new partitioned BQ table with creation_date as partition column using the below command.

 

bq load --source_format=AVRO \

--time_partitioning_type DAY \

--time_partitioning_field creation_date \

--require_partition_filter=true \

--use_avro_logical_types=true \

bigquery_cost_optimization.partition_free \

gs://evonence-bq-cost-optimization-tips/avro/*.zip

New partitioned BQ table
Creation of Job details


Step #4: Finally, delete the Cloud Storage bucket using the below command.

GCP-AWS-Migration

In these four steps, bq extract and bq load is totally free. Storing the AVRO in compressed form in Cloud Storage will only incur cost but only for few minutes because in step #4 the bucket is deleted.

Benefits

  1. Using the shared slot pool operations like Loading data, Copying data, Exporting data, Delete datasets, tables, view are totally free of charge in BigQuery.  

  2. Use Cloud Storage as staging environment while using BigQuery

Conclusion

Leverage BigQuery’s Free Operations like Loading data, Copying data, Exporting data and so on fully & smartly to reduce your BigQuery spendings. For more such tips and tricks stay tuned…











Previous
Previous

Clear Medical Imaging moves to Google Workspace

Next
Next

Cloud Run accessing Cloud SQL with Serverless VPC Connector