Evonence | Google Cloud Partner

View Original

BigQuery Cost Optimization Tips - Partitioning a Regular BQ Table from a Public Dataset



BigQuery is Google Cloud’s Enterprise Multicloud Data warehouse designed for business agility. It is a Serverless DWH meaning Google does all resource provisioning behind the scenes, so you can focus on data and analysis rather than worrying about upgrading, securing, or managing the infrastructure.

Since it is Serverless you need to know which SQL Query will be charged and which will be free. We at Evonence very cautiously run SQL Queries in a smart way to reduce cost while performing various ELT and Analytics. Happy to share some of the Cost Optimization Tips with the community.

The game starts now...

Make sure you have a GCP Project with Billing enabled. BigQuery & Cloud Storage APIs’ will be enabled by default.

  1. Create a BigQuery Dataset








2. Lets partition bigquery-public-data.stackoverflow.stackoverflow_posts table in a cost optimized way. 

  • Unoptimized way: If we use the below SQL query which consists of a SELECT * which is a full table scan that incurs a cost for processing 29.4GiB of data.





  • Optimized way in four steps:  Since Data Loading from Cloud Storage and Data Export (extract) to Cloud Storage is totally free. Will leverage this to have a partitioned of bigquery-public-data.stackoverflow.stackoverflow_posts table

Step #1: Create a Standard Region Cloud Storage bucket using the below command.

Step #2: Run bq extract to export the data to this Cloud Storage bucket in an AVRO compressed format using the below command.

bq extract --compression=SNAPPY \

--destination_format=AVRO \

bigquery-public-data:stackoverflow.stackoverflow_posts \ 

gs://evonence-bq-cost-optimization-tips/avro/*.zip

Step #3: Run bq load to load the data from this Cloud Storage bucket to new partitioned BQ table with creation_date as partition column using the below command.

 

bq load --source_format=AVRO \

--time_partitioning_type DAY \

--time_partitioning_field creation_date \

--require_partition_filter=true \

--use_avro_logical_types=true \

bigquery_cost_optimization.partition_free \

gs://evonence-bq-cost-optimization-tips/avro/*.zip


Step #4: Finally, delete the Cloud Storage bucket using the below command.

In these four steps, bq extract and bq load is totally free. Storing the AVRO in compressed form in Cloud Storage will only incur cost but only for few minutes because in step #4 the bucket is deleted.

Benefits

  1. Using the shared slot pool operations like Loading data, Copying data, Exporting data, Delete datasets, tables, view are totally free of charge in BigQuery.  

  2. Use Cloud Storage as staging environment while using BigQuery

Conclusion

Leverage BigQuery’s Free Operations like Loading data, Copying data, Exporting data and so on fully & smartly to reduce your BigQuery spendings. For more such tips and tricks stay tuned…