FinTech startup Branch taps BigQuery for data analytics | Google Cloud Blog (2024)

Editor’s note: Here we take a look at how Branch, a fintech startup, built their data platform with BigQuery and other Google Cloud solutions that democratized data for their analysts and scientists.

As a startup in the fintech sector, Branch helps redefine the future of work by building innovative, simple-to-use tech solutions. We’re an employer payments platform, helping businesses provide faster pay and fee-free digital banking to their employees. As head of the Behavioral and Data Science team, I was tapped last year to build out Branch’s team and data platform. I brought my enthusiasm for Google Cloud and its easy-to-use solutions to the first day on the job.

We chose Google Cloud for ease-of-use, data & savings

I had worked with Google Cloud previously, and one of the primary mandates from our CTO was “Google Cloud-first,” with the larger goal of simplifying unnecessary complexity in the system architecture and controlling the costs associated with being on multiple cloud platforms.

From the start, Google Cloud’s suite of solutions supported my vision of how to design a data team. There’s no one-size-fits-all approach. It starts with asking questions: what does Branch need? Which stage are we at? Will we be distributed or centralized? But above all, what parameters in the product will need to be optimized with analytics and data science approaches? With team design, product parameterization is critical. With a product-driven company, the data science team can be most effective by tuning a product’s parameters—for example, a recommendation engine for an ecommerce site is driven by algorithms and underlying models that are updating parameters. “Show X to this type of person but Y to this type of person,” X and Y are the parameters optimized by modeling behavioral patterns. Data scientists behind the scenes can run models as to how that engine should work, and determine which changes are needed.

By focusing on tuning parameters, the team is designed around determining and optimizing an objective function. That of course relies heavily on the data behind it. How do we label the outcome variable? Is a whole labeling service required? Is it clean data with a pipeline that won’t require a lot of engineering work? What data augmentation will be needed?

With that data science team design envisioned, I started by focusing on user behavior—deciding how to monitor and track it, how to partner with the product team to ensure it’s in line with the product objectives, then spinning up A/B testing and monitoring. On the optimization side, transaction monitoring is critical in fintech. We need to look for low-probability events and abnormal patterns in the data, and then take action, either reaching out to the user as quickly as possible to inform them, or stopping the transaction directly. In the design phase, we need to determine if these actions need to be done in real-time or after the fact. Is it useful to the user to have that information in real time? For example, if we are working to encourage engagement, and we miss an event or an interaction, it’s not the end of the world. It’s different with a fraud monitoring system, for which you’ve got to be much more strict about real-time notifications.

Our data infrastructure

There are many use cases at Branch for data cloud technologies from Google Cloud. One is with “basic” data work. It’s been incredibly easy to use BigQuery, Google’s serverless data warehouse, which is where we’ve replicated all of our SQL databases, and Cloud Scheduler, the fully managed enterprise-grade cron job scheduler. These two tools, working together, make it easy to organize data pipelining. And because of their deep integration, they play well with other Google Cloud solutions like Cloud Composer and Dataform, as well as with services, like Airflow, from other providers. Especially for us as a startup, the whole Google Cloud suite of products accelerates the process of getting established and up and running, so we can perform the “bread-and-butter” work of data science.

We also use BigQuery as a holder of heavier stats, and we train our models there, weekly, monthly, nightly, depending on how much data we collect. Then we leverage the messaging and ingestion tool Pub/Sub and its event systems to get the response in real time. We evaluate the output for that model in a Dataproc cluster or Dataform, and run all of that in Python notebooks, which can call out to BigQuery to train a model, or get evaluated and pass the event system through.

Full integration of data solutions

At the next level, you need to push data out to your internal teams. We are growing and evolving, so I looked for ways to save on costs during this transition. We do a heavy amount of work in Google Sheets because it integrates well with other Google services, getting data and visuals out to the people who need them; enabling them to access raw data and refresh as needed.

Google Groups also makes it easy to restrict access to data tables, which is a vital concern in the fintech space. The infrastructure management and integration of Google Groups make it super useful. If an employee departs the organization, we can easily delete or control their level of access. We can add new employees to a group that has a certain level of rights, or read and write access to the underlying databases. As we grow with Google Cloud, I also envision being able to track the user levels, including who’s running which SQLs and who’s straining the database and raising our costs.

A streamlined data science team saves costs

I’d estimate that Google Cloud’s solutions have saved us the equivalent of one full-time engineer we’d otherwise need to hire to link the various tools together, making sure that they are functional and adding more monitoring. Because of the fully managed features of many of Google Cloud’s products, that work is done for us, and we can focus on expanding our customer products. We’re now 100% Google Cloud for all production systems, having consolidated from IBM, AWS, and other cloud point solutions.

For example, Branch is now expanding financial wellness offerings for our customers to encourage better financial behavior through transaction monitoring, forecasting their spend and deposits, and notifying them of risks or anomalies. With those products and others, we’ll be using and benefiting from the speed, scalability, and ease of use of Google Cloud solutions, where they always keep data—and data teams—top of mind.

Learn more about Branch. Curious about other use cases for BigQuery? Read how retailers can use BigQuery ML to create demand forecasting models.

Posted in
  • Data Analytics
  • Google Cloud
FinTech startup Branch taps BigQuery for data analytics | Google Cloud Blog (2024)

FAQs

Does Google Analytics use BigQuery? ›

BigQuery is a cloud data warehouse that lets you run super-fast queries of large datasets. You can export session and hit data from a Google Analytics 360 account to BigQuery, and then use a SQL-like syntax to query all of your Analytics data.

What benefits can you see Google BigQuery providing for the analysis of business datasets? ›

BigQuery BI Engine is a fast, in-memory analysis service that lets you build rich, interactive dashboards and reports without compromising performance, scalability, security, or data freshness.

Is Snowflake better than BigQuery? ›

As you can see, BigQuery is generally more cost-effective for workloads with high data processing requirements. However, Snowflake can be more cost-effective for workloads with high storage requirements or for workloads that need to be scaled up quickly.

What are the disadvantages of BigQuery? ›

Cost Implications: Streaming data into BigQuery can be more expensive than batch loading, especially with large volumes of data. Limited Features: Primarily designed for data ingestion from SAP to BigQuery, it might not support extensive ETL operations.

When should you not use BigQuery? ›

If you only want to read the data, use the BigQuery Storage API. If you want to make a copy within BigQuery, then use a copy job. Scale: The BigQuery API is the least efficient method and shouldn't be used for high volume reads.

What are the two services that BigQuery provides? ›

A: BigQuery provides two primary services: data storage and data analysis. BigQuery also offers powerful analytics capabilities, such as BigQuery SQL queries, user-defined functions, scripts, and machine learning models. With BigQuery, you can quickly analyze data to gain valuable insights and make informed decisions.

What is BigQuery in data analytics? ›

BigQuery ML provides machine learning and predictive analytics. BigQuery Studio offers features such as Python notebooks, and version control for both notebooks and saved queries. These features make it easier for you to complete your data analysis and machine learning (ML) workflows in BigQuery.

Is BigQuery easy to learn? ›

Learning BigQuery can be easy for those familiar with SQL. Its advantages include fast data processing and scalability. It can help with Google Analytics 360 by enabling complex data analysis and integration with other Google Cloud services.

What is BigQuery good for? ›

Querying and viewing data: BigQuery allows you to run interactive queries. You can also run batch queries and create virtual tables from your data. Managing data: BigQuery allows you to list projects, jobs, datasets, and tables.

How to fetch data from BigQuery? ›

To do this, you will need to download a JSON file that contains the BigQuery service account credentials. If you don't have a service account, follow this guide to create one, and then proceed to download the JSON file to your local machine. with the actual path to the locally stored JSON file.

How do you think you can use public datasets in BigQuery to help develop your data analysis skills? ›

You can access BigQuery public datasets by using the Google Cloud console, by using the bq command-line tool, or by making calls to the BigQuery REST API using a variety of client libraries such as Java, . NET, or Python.

Is Databricks similar to BigQuery? ›

Performance: Databricks and BigQuery are both very fast, but Databricks can be faster for certain workloads, such as machine learning and data processing. BigQuery can be faster for other workloads, such as business intelligence and data warehousing.

What is the alternative to Google BigQuery? ›

We have compiled a list of solutions that reviewers voted as the best overall alternatives and competitors to Google Cloud BigQuery, including Snowflake, Databricks Data Intelligence Platform, Amazon Redshift, and Teradata Vantage. Have you used Google Cloud BigQuery before?

Why is Databricks better than Snowflake? ›

In contrast, Databricks lets you optimize data processing jobs to run high-performance queries. Finally, Snowflake is batch-based and needs the entire dataset for results computation, while Databricks is a continuous data processing (streaming) system that also offers batch processing.

Does Google use BigQuery? ›

BigQuery is Google's fully managed, petabyte-scale, low-cost data warehouse for analytics.

Do data analysts use BigQuery? ›

In the first module, we look at analytics challenges faced by data analysts and compare big data on-premises versus on the Cloud. We then introduce BigQuery, which is Google Cloud's enterprise data warehouse, and review its features that make BigQuery a great option for your data analytics needs.

What data does Google Analytics use? ›

Google Analytics collects the following information through the default implementation: Number of users. Session statistics. Approximate geolocation.

What is the difference between GA4 and BigQuery? ›

Data Processing and Attribution Differences: GA4 UI and BigQuery might process and attribute session data differently. In GA4 UI, the session medium might be updated in real-time based on user interactions, while in BigQuery, the data might be more static, reflecting the state at the time of export.

Top Articles
Latest Posts
Article information

Author: Delena Feil

Last Updated:

Views: 6602

Rating: 4.4 / 5 (45 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Delena Feil

Birthday: 1998-08-29

Address: 747 Lubowitz Run, Sidmouth, HI 90646-5543

Phone: +99513241752844

Job: Design Supervisor

Hobby: Digital arts, Lacemaking, Air sports, Running, Scouting, Shooting, Puzzles

Introduction: My name is Delena Feil, I am a clean, splendid, calm, fancy, jolly, bright, faithful person who loves writing and wants to share my knowledge and understanding with you.