Google Cloud Services for Big Data Projects

Google Cloud Platform provides various services for data analysis and Big Data applications. All those services are integrable with other Google Cloud products, and all of them have their pros and cons. 

This article will review what services Google Cloud Platform can offer for data and Big Data applications and what those services do. We’ll also check out what benefits and limitations they have, the pricing strategy of each service, and their alternatives.

Cloud PubSub

Cloud PubSub is a message queue broker that allows applications to exchange messages reliably, quickly, and asynchronously. Based on the publish-subscription pattern.

Visualization of PubSub workflow

[Visualization of PubSub workflow]

The diagram above describes the basic flow of the PubSub. First, publisher applications publish messages to a PubSub topic. Then the topic sends messages to PubSub subscriptions; the subscriptions store messages; subscriber applications read messages from the subscriptions.

Benefits

  • A highly reliable communication layer
  • High capacity

Limitations

  • 10 MB is the maximum size for one message
  • 10 MB is the maximum size for one request, which means if we need to send ten messages per request, the average total length for each notification will be 1 MB.
  • The maximum attribute value size is 1 MB

Pricing strategy

You pay for transferred data per GB.

Analogs & alternatives

  • Apache Kafka
  • RabbitMQ
  • Amazon SQS
  • Azure Service Bus
  • Other Open Source Message Brokers

Google Cloud IoT Core

The architecture of Cloud IoT Core

[The architecture of Cloud IoT Core]

Cloud IoT Core is an IoT devices registry. This service allows devices to connect to the Google Cloud Platform, receive messages from other devices, and send messages to those devices. To receive messages from devices, IoT Core uses Google PubSub.

Benefits

  • MQTT and HTTPS transfer protocols
  • Secure device connection and management

Pricing Strategy

You pay for the data volume that you transfer across this service.

Analogs & alternatives

  • AWS IoT Core
  • Azure IoT

Cloud Dataproc

Cloud Dataproc for Apache Spark and Apache Hadoop

Cloud Dataproc is a faster, easier, and more cost-effective way to run Apache Spark and Apache Hadoop in Google Cloud. Cloud Dataproc is a cloud-native solution covering all operations related to deploying and managing Spark or Hadoop clusters. 

In simple terms, with Dataproc, you can create a cluster of instances on Google Cloud Platform, dynamically change the size of the cluster, configure it, and run MapReduce jobs.

Benefits

  • Fast deployment
  • Fully managed service means you need just the right code, no operation work
  • Dynamically resize the cluster
  • Auto-Scaling feature

Limitations

  • No choice of selecting a specific version of the used framework
  • You cannot pause/stop Data Proc Cluster to save money. Only delete the cluster. It’s possible to do via Cloud Composer
  • You cannot choose a cluster manager, only YARN

Pricing strategy

You pay for each used instance with some extra payment—Google Cloud Platform bills for each minute when the cluster works.

Analogs & alternatives

  • Set-up cluster on virtual machines
  • Amazon EMR
  • Azure HDInsight

Cloud Dataflow

The place of Cloud Dataflow in a Big Data application on Google Cloud Platform

[The place of Cloud Dataflow in a Big Data application on Google Cloud Platform]

Cloud Dataflow is a managed service for developing and executing a wide range of data processing patterns, including ETL, batch, streaming processing, etc. In addition, Dataflow is used for building data pipelines. This service is based on Apache Beam and supports Python and Java jobs.

Benefits

  • Combines batch and streaming with a single API
  • Speedy deployment
  • A fully managed service, no operation work
  • Dynamic work rebalancing
  • Autoscaling

Limitations

  • Based on a single solution, therefore, inherits all limitations of Apache Beam
  • The maximum size for a single element value in Streaming Engine is 100 Mb

Pricing strategy

Cloud Dataflow jobs are billed per second, based on the actual use of Cloud Dataflow.

Analogs & alternatives

  • Set-up cluster on virtual machines and run Apache Beam via in-built runner
  • As far as I know, other cloud providers don’t have analogs.

Google Cloud Dataprep

The interface of Dataprep

[The interface of Dataprep]

Dataprep is a tool for visualizing, exploring, and preparing data you work with. You can build pipelines to ETL your data for different storage. And do it on a simple and intelligible web interface.

For example, you can use Dataprep to build the ETL pipeline to extract raw data from GCS, clean up this data, transform it to the needed view, and load it into BigQuery. Also, you can schedule a daily/weekly/etc job that will run this pipeline for new raw data.

Benefits

  • Simplify building of ETL pipelines
  • Provide a clear and helpful web interface
  • Automate a lot of manual jobs for data engineers
  • Built-in scheduler
  • To perform ETL jobs, Dataprep uses Google Dataflow

Limitations

  • Works only with BigQuery and GCS

Pricing Strategy

For data storing, you pay for data storage. For executing ETL jobs, you pay for Google Dataflow.

Cloud Composer

Cloud Composer is a workflow orchestration service

Cloud Composer is a workflow orchestration service to manage data processing. Cloud Composer is a cloud interface for Apache Airflow. Composer automates the ETL jobs. One example is to create a Dataproc cluster, perform transformations on extracted data (via a Dataproc PySpark job), upload the results to BigQuery, and then shut down the Dataproc collection.

Benefits

  • Fills the gaps of other Google Cloud Platform solutions, like Dataproc
  • Inherits all advantages of Apache Airflow

Limitations

  • Provides the Airflow web UI on a public IP address
  • Inherits all rules of Apache Airflow

Pricing Strategy

You pay only for resources on which Composer is deployed. But the Composer will be deployed to 3 instances.

Analogs & alternatives

  • Custom deployed Apache Airflow
  • Other orchestration open source solution

BigQuery

BigQuery is a data warehouse

[Example of integration BigQuery into a data processing solution with different front-end integrations] 

BigQuery is a data warehouse. BigQuery allows us to store and query massive datasets of up to hundreds of Petabytes. BigQuery is very familiar to relational databases by their structure. It has a table structure, uses SQL, supports batch and streaming writing into the database, and is integrated with all Google Cloud Platform services, including Dataflow, Apache Spark, Apache Hadoop, etc. It’s best for use in interactive queuing and offline analytics.

Benefits

  • Huge capacity, up to hundreds of Petabytes
  • SQL
  • Batch and streaming writing
  • Support complex queries
  • Built-in ML
  • Serverless
  • Shared datasets — you can share datasets between different projects
  • Global locations
  • All popular data processing tools have interfaces to BigQuery

Limitations

  • It doesn’t support transactions, but those who need transitions in the OLAP solution
  • The maximum size of the row is 10Mb

Pricing strategy

You pay separately for stored information(for each Gb) and executed queries.

You can choose one of two payment models concerning executed queries, either paying for each processed Terabyte or a stable monthly cost depending on your preferences.

Analogs & alternatives

  • Amazon Redshift
  • Azure Cosmos DB

Cloud BigTable

Google Cloud BigTable is Google's NoSQL Big Data database service

Google Cloud BigTable is Google’s NoSQL Big Data database service. The same database powers many core Google services, including Search, Analytics, Maps, and Gmail. Bigtable is designed to handle massive workloads at consistent low latency and high throughput, so it’s an excellent choice for operational and analytical applications, including IoT, user analytics, and financial data analysis.

Cloud Bigtable is based on Apache HBase. This database has an enormous capacity and is suggested for use more than Terabyte data. One example, BigTable is the best for time-series data and IoT data.

Benefits

  • Has good performance on 1Tb or more data
  • Cluster resizing without downtime
  • Incredible scalability
  • Support API of Apache HBase

Limitations

  • Has bad performance on less than 300 Gb data
  • It doesn’t suit real-time
  • It doesn’t support ACID operations
  • The maximum size of a single value is 100 Mb
  • The maximum size of all values in a row is 256 Mb
  • The maximum size of the hard disk is 8 Tb per node
  • A minimum of three nodes in the cluster

Pricing Strategy

BigTable is very expensive. You pay for nodes (minimum $0.65 per hour per node) and storage capacity(minimum 26$ per Terabyte per month)

Analogs & alternatives

  • Custom deployed Apache HBase

Cloud Storage

GCS is blob storage for files

GCS is blob storage for files. You can store any amount of any size files there.

Benefits

  • Good API for all popular programming languages and operating systems
  • Immutable files
  • Versions of files
  • Suitable for any size files
  • Suitable for any amount of files
  • Etc

Pricing Strategy

GCS has a couple of pricing plans. In a standard plan, you pay for 1Gb of saved data.

Analogs & alternatives

  • Amazon S3
  • Azure Blob Storage

How to make your IT project secured?

Download Project Security Checklist

Other Google Cloud Services

There are a few more services that I should mention.

Google Cloud Compute Engine provides virtual machines with any performance capacity.

Google CloudSQL is a cloud-native solution to host MySQL and PostgreSQL databases. Has built-in vertical and horizontal scaling, firewall, encrypting, backups, and other benefits of using Cloud solutions. Has a terabyte capacity. Supports complex queries and transactions

Google Cloud Spanner is a fully managed, scalable, relational database service. Supports SQL queries, auto replication, transactions. It has a one-petabyte capacity and suits best for large-scale database applications which store more than a couple of terabytes of data.

Google StackDriver monitors Google services and infrastructure, and your application is hosted in a Google Cloud Platform.

Cloud Datalab is a way to visualize and explore your data. This service provides a cloud-native way to host Python Jupyter notebooks.

Google Cloud AutoML and Google AI Platform allow training and hosting of high-quality custom machine learning models with minimal effort.

Conclusion

Now you are familiar with the primary data services that Google Cloud Platform provides. This knowledge can help you to build a good data solution. But, of course, Clouds are not a silver bullet, and in case you use Clouds in the wrong way, it can significantly affect your monthly infrastructure billing.

Thus, carefully build your proposal’s architecture and choose the necessary services for your needs to reach your needed business goals. Explore all benefits and limitations for each particular case. Care about costs. And, of course, remember about the scalability, reliability, and maintainability of your solution.

Useful links:

10 Steps for Building a Successful Cloud Migration Strategy

Imagine that you recently launched a social networking app. To host the app’s infrastructure, you decided to use the existing on-premise server because you do not expect it to handle many users immediately. Your app is going viral and, during just one month, over 1000, 000 users downloaded and used it on a daily basis. Do you know what will happen next? Since your server infrastructure was not ready for such huge loads, it will now not work correctly. Thus, instead of your apps’ interface, users will see an error message and you will lose a significant amount of them because your app failed to live up to their expectations. 

To avoid situations where you jeopardize user trust, use cloud platforms for both hosting databases and running app infrastructure. 

Such data giants as Facebook, Netflix, and Airbnb, already adopted a migration strategy to the cloud due to cheap costs, auto-scaling features, and addons as real-time analytics. Oracle research says 90% of enterprises will run their workloads on the cloud by 2025. If you already run data centers or infrastructure with an on-premise environment, and you will need more capacity in the future, consider migrating to the cloud as a solution.  

Yet, to migrate to the cloud is not as simple as it seems. To successfully migrate to the cloud you need, not only an experienced developer but also a solid cloud application migration strategy. 

If you are ready to leverage cloud solutions for your business, read this article to the end. 

By the end of this blog post, you will know about cloud platform types and how to successfully migrate to cloud computing.

Cloud migration strategies: essential types

Migration to the cloud means transferring your data from physical servers to a cloud hosting environment. This definition is also applicable for migrating data from one cloud to another platform. Migration in cloud computing includes different types, due to the number of code changes developers need to conduct. The main reason is that not all data is ready to be moved to the cloud by default.

Let’s go through the main types of application migration to the cloud one by one. 

  • Rehosting. This is the process of moving data from on-premise storage and redeploying it on cloud servers. 
  • Restructuring. Such a migration requires changes in the initial code to meet the cloud requirements. Only then can you move the system to a platform-as-a-service (PaaS) cloud model. 
  • Replacement migration means switching from existing native apps to third-party apps. An example of replacement is migrating data from custom CRM to Salesforce CRM. 
  • Revisionist migration. During such a migration, you make global changes in the infrastructure to allow the app to leverage cloud services. By ‘cloud services’ we mean auto-scaling, data analytics, and virtual machines. 
  • Rebuild is the most drastic type of cloud migration. This type means discarding the existing code base and building a new one on the cloud. Apply this strategy if the current system architecture does not meet your goals. 

How to nail cloud computing migration: essential steps

For successful migration to the cloud, you need to go through the following steps of the cloud computing migration strategy. 

Step 1. Build a cloud migration team 

First, you need to hire the necessary specialists and employ the distribution of roles. In our experience, a cloud migration team should include: 

  • Executive Sponsor, a person who handles creating a cloud data migration strategy. If you have enough tech experience, you can take this role. If not, your CTO or a certified cloud developer will ideally suit you. 
  • Field General handles project management and migration strategy execution. This role will suit your project manager if you have one. If not, you can hire a dedicated specialist with the necessary skills. 
  • Solution Architect is an experienced developer who has completed several cloud migration projects. This person will build and maintain the architecture of your cloud. 
  • Cloud Administrator ensures that your organization has enough cloud resources. You need an expert in virtual machines, cloud networking, development, and deployment on IaaS and PaaS. 
  • Cloud Security Manager will set up and manage access to cloud resources via groups, users, and accounts. This team member configures, maintains, and deploys security baselines to a cloud platform. 
  • Compliance Specialist ensures that your organization meets the privacy requirements. 

Step 2.Choose cloud service model 

There are several types of cloud platforms. Each of them provides different services to meet various business needs. Thus, you need to define your requirements for a cloud solution and select the one with the intended set of workflows. However, this step is challenging, especially if you have no previous experience with cloud platforms. To make the right decision, receive a consultation from experienced cloud developers. But, to be on the same page with your cloud migration team, you need to be aware of essential types of cloud platform services, such as SaaS, PaaS, IaaS, and the differences between them.

  • SaaS (Software as a Service)

Chose SaaS to receive advantages of running apps without maintaining and updating infrastructure. SaaS providers also offer you cloud-based software, programs, and applications. SaaS platforms charge a monthly or yearly subscription fee. 

  • IaaS (Infrastructure as a Service)

This cloud model suits businesses that need more computing power to run variable workloads with fewer costs. With IaaS, you will receive a ready-made computing infrastructure, networking resources, servers, and storage. IaaS solutions apply a pay-as-you-go pricing policy. Thus, you can increase the cloud solution’s capacity anytime you need it. 

  • PaaS (Platform as a service)

Chose this cloud platform type for adopting agile methodology in your development team, since PaaS allows the faster release of app updates. You will also receive an infrastructure environment to develop, test, and deploy your apps, thus increasing the performance of your development team.

cloud migration cloud service models

Step 3. Define cloud solution type

Now you need to select the nature of your cloud solution from among the following:

  • Public Cloud is the best option when you need a developing and testing environment for the app’s code. Yet, the public cloud migration strategy is not the best option for moving sensitive data. Public clouds include high risks of data breaches. 
  • Private Cloud providers give you complete control over your system and its security. Thus, private clouds are the best choice for storing sensitive data.
  • The hybrid cloud migration strategy combines both public and private cloud solutions characteristics. Chose a hybrid cloud to use using a SaaS app and get advanced security. Thus, you can operate your data in the most suitable environment. The main drawback is tracking various security infrastructures at once, which is challenging.

Step 4. Decide the level of cloud integration

Before moving to cloud solutions you need to choose the level of cloud integration among shallow and deep integration. Let’s find out what the difference is between them. 

  • Shallow cloud integration (lift-and-shift). To complete shallow cloud migration, developers need to conduct minimal changes to the server infrastructure. However, you can not use the extra services of cloud providers. 
  • Deep cloud integration means adding changes to an app’s infrastructure. Chose this strategy if you need serverless computing capabilities (Google Cloud Platform services), and cloud-specific data storage (Google Cloud Bigtable, Google Cloud Storage).

Step 5. Select a single cloud or multi-cloud environment

You need to choose whether to migrate your application on one cloud platform or use several cloud providers at once. Your choice will impact the time required for infrastructure preparation for cloud migration. Let’s look at both options in more detail. 

Running an app on one cloud is a more straightforward option. Your team will need to optimize it to work with the selected cloud provider and learn one set of cloud API. But, this approach has a drawback – a vendor lock-in. It means that it will be impossible to change the cloud provider. 

If you want to leverage multiple cloud providers, choose among the following options: 

  • To run one application set on one cloud, and another app’s components on another cloud platform. The benefit is that you can try different cloud providers at once, and choose where to migrate apps in the future. 
  • To split applications across many different cloud platforms is another option. Thus, you can use the critical advantages of each cloud platform. However, consider that the poor performance of just one cloud provider may increase your app’s downtime. 
  • To build a cloud-agnostic application is another option that allows you to run the app’s data on any cloud. The main drawback is the complicated process of app development and feature validation.

Step 6. Prioritize app services

You can move all your app components at once, or migrate them gradually. To find out which approach suits you the best, you need to detect the dependencies of your app. You can identify the connections between components and services manually or generate a dependencies diagram via a service map. 

Now, select services with the fewest dependencies to migrate them first. Next, migrate services with more dependencies that are closest to users.

Step 7. Perform refactoring

In some cases, you will need to make code refactoring before moving to the cloud. In this way, you ensure all your services will work in the cloud environment. The most common reasons for code refactoring are: 

  • Ensuring the app performs well with different running instances and supports dynamic scaling 
  • Defining the apps’ resource use dynamic-cloud capabilities, rather than allocating them beforehand

Step 8. Create a cloud migration project plan

Now, you and your team can outline a migration roadmap with milestones. Schedule the migration according to your data location and the number of dependencies. Also, consider that, despite the migration, you need to keep your app accessible to users. 

Step 9. Establish cloud KPIs

Before moving data to a cloud, you need to define Key Performance Indicators. These indicators will help you to measure how well it performs in the new cloud environment. 

In our experience, most businesses track the following KPI’s:

  • Page loading speed
  • Response time
  • Session length
  • Number of errors
  • Disc performance
  • Memory usage

And others. You can also measure your industry-specific KPIs, like the average purchase order value for mobile e-commerce apps.

Step 10. Test, review, and make adjustments as needed

After you’ve migrated several components, run tests, and compare results with pre-defined KPIs. If the migrated services have positive KPIs, migrate other parts. After migrating all elements, conduct testing to ensure that your app architecture runs smoothly. 

Download Free E-book with DevOps Checklist

Download Now

Cloud migration checklist from The APP Solutions

Cloud providers provide different services to meet the needs of various businesses. You need help from professionals to choose the right cloud solution. 

We often meet clients who have trouble with selecting a cloud provider. In these cases, we do an audit of a ready-made project’s infrastructure. Next, we help clients to define their expectations for the new cloud environment. To achieve this, we show a comparison of different cloud providers and their pros and cons. Then, we adopt a project for a cloud infrastructure, which is essential for a successful migration. 

When looking for a cloud provider, consider the following parameters: 

  • Your budget, which means, not only the cost of cloud solutions but also the budget for cloud migration
  • The location of your project, target audience, and security regulations (HIPAA, GDPR)
  • The number of extra features you want to receive, including CDN, autoscaling, backup requirements, etc. 

Migration to a cloud platform is the next step for all business infrastructures. However, you need to consider that cloud migration is a comprehensive process. It requires, not only time and money but also a solid cloud migration strategy. To ensure your cloud migration is going right, you need to establish and track KPIs. Fill in the contact form to receive a consultation or hire a certified cloud developer.

the app solutions google cloud partner