Geoffrey Keating on November 10th 2021

Twilio Segment releases its 2021 Growth Report, an exclusive view into how customer data fuels today’s most high-performing businesses.

Recent articles

Geoffrey Keating on November 10th 2021

Twilio Segment releases its 2021 Growth Report, an exclusive view into how customer data fuels today’s most high-performing businesses.

Guest author: Dan McGaw on November 1st 2021

I've watched thousands of startups build, grow, and optimize their tech stacks. Throughout the years, and while implementing Segment, teams have asked me the same questions time and time again.

  1. Which events should we track?

  2. What about metrics and customer data?

The answers are consequential; I've seen too many promising startups with great products fail due to fixable logistical issues and insufficient resources. That's why we’re collaborating with Segment’s Startup Program to give you this take on the tools, metrics, and user events crucial for success: this time for ecommerce companies.

The Added Value of a Solid Ecommerce Stack 

The nature of ecommerce necessitates a speedy tech stack to seamlessly collect user data, transmit it to the relevant destination, and respond to customer activity in real-time. The complete customer journey and closed reporting loop that will provide a stable framework from which you can scale your business.

As your business grows, you'll use your tech stack to further refine and optimize your practices, now armed with historical data that covers the full customer journey. From here, you'll find and grow the lifetime value of your customers, allowing you to optimize your customer acquisition for actual ROI and ROAS. You’ll also have what it takes to improve drop-off rates, or automate communication throughout the touchpoints.

Ultimately, you’ll be able to snowball the revenue.

Ecommerce Events to Track

So, which events are most relevant to ecommerce startups? 

You want to focus on the customer journey and cover its full range. Cover the various stages in the purchase flow. Pay attention to the following interactions:

  • Product Viewed: Occurs when a user visits a product page.

  • Product Added: Whenever a visitor adds a product to their shopping cart.

  • Checkout Started: Once a visitor adds a product and clicks the “checkout” button.

  • Checkout Step Completed: After a visitor adds a product, clicks the “checkout” button, enters their payment information, and moves onto the final stage of checkout..

  • Order Completed: Once a visitor completes entering all their information for purchase and reaches the thank you screen.

Top Ecommerce Metrics: How a Stack Helps Measure What Matters

As a part of transactional sites, e-commerce pages are focused on converting visitors into customers and first-time customers into repeat customers. The metrics most valuable to you will be those that uncover subtle shopping behavior, connecting your customer's average purchase spend with their repurchase rate, lifetime value, and finally, to your sales totals.

Visit to Order Rate

The percentage of visits that convert into orders. The metric helps you answer “What makes the customers purchase the first time?”

Average Order Value (AOV)

The averaged total value of every order placed on a store over a specified period. The metric helps you answer “What makes the customers spend more?” or “Which campaign leads to the highest purchases?”

Total Revenue & Orders 

Shows the revenue you've earned across all channels and orders placed. This is often the KPI for ecommerce marketing. The metric helps you answer “Which products are our 80-20?” or “What are the main movers of our top-line revenue?”.

Customer Lifetime Value (CVL)

The total value in dollars the customer will contribute over a lifetime. Lifetime is usually calculated between 12 and 24 months. The metric helps you answer “What is the real value of the customer we acquired?”

CLV takes reporting from immediate to real value. Lifetime value is often a lot higher than immediate value. Also, when one channel, campaign or product stands out in immediate value, it doesn’t necessarily stand out in lifetime value.

Repurchase Rate & Frequency

Another valuable metric that measures repeat purchases. Repurchase rate and frequency are calculated by dividing the number of customers who made at least two purchases in a given timeframe by the total customer count. 

One of your most actionable metrics in ecommerce is your repeat purchase rate. You can measure it with the help of your order completed event, and tying it to the customer whose journey you follow. Then you’ll build out a funnel to see what channels and products drive repeat purchases. So the metric will help answer “What makes our customers come back?”

Use Cases of Integrations for Ecommerce Growth

By combining the power and utility of different MarTech tools, you expand the capabilities of each. That’s how you use a stack to improve or scale ecommerce revenue. Let’s walk through examples.

Add Customer.io Email Events to Customer Journeys

Email events from platforms such as Customer.io often do not make it to user analytics or data warehouses. That’s a huge missed opportunity for personalization or touchpoint automation. Segment translates  event data from your email platform and passes it on to tools such as BigQuery and Amplitude.

You can then follow the full customer journey, optimize based on events such as Email Delivered / Opened / Clicked / Unsubscribe, or create complete reporting about your email flows.

Report on Javascript Data Sources

Javascript is the standard language for analytics. Along with the biggest tools such as Google Analytics or Tag Manager, and many others, you’ll see javascript in custom attribution models. The models often populate data that’s hard to integrate exactly because it is custom, and doesn’t always follow the formats required by your favorite reporting tools. This can make reporting much less insightful.

With Segment, you can translate your custom attribution data and taxonomy into a format that’s understood by a data warehouse such as BigQuery. From there, you can easily send it to a reporting tool such as Chartio or PopSQL. Think data such as page and identity tables, the common attribution data points. You’ll be on your way to a closed reporting loop, with accurate and complete ROAS insights.

Calculate the Full Customer Acquisition Cost

Just like with the above-mentioned ability to translate custom javascript to the rest of your MarTech stack, Segment plays the Rosettta Stone of APIs for your ad platforms. Segment will pull data from Facebook Ads, Google Ads, and other ad marketplaces, then push it into your BigQuery data warehouse. You’ll be able to get the full picture of how the different ads push the customers down the funnel, and you can also add data about interactions such as checkout events.

As a result, you’ll be able to send complete acquisition data into a reporting tool such as Chartio or PopSQL. There, you can, for example, build models for calculating key metrics such as CAC (Customer Acquisition Cost).

Power and Automate Custom Messaging

In MarTech stacks based on Segment, you can have the same tool receive or send data. Customer.io is an example. Above, we had an example of Customer.io sitting upstream, so it could send data for use in customer journey analytics. Here, Customer.io sits downstream, receiving data from a javascript source.

Segment can even send your JS data to multiple communication platforms at a time. So e.g. Customer.io for emails and Drift for chat and text messages. This way, all of your messaging can be both custom and automated based on events or custom user traits. User events like shopping cart abandonment, newsletters signups, and purchases can help segment the customers. You’ll be able to automatically trigger personalized email sends, push notifications, or SMS messages. Or you’ll be able to make your chat bot much more personal and useful.

Follow Product-level Activity

Amplitude is a popular analytics platform that's grown in popularity thanks to its depth, predictive and personalization capabilities, and automated optimization features. When enriched with clean, normalized data, users can optimize across the entire customer journey, improve user acquisition, uncover purchase behavior patterns, and connect product-level data with revenue.

However, imagine your user data is tied up in a JS source such as Google Analytics. So we’ll once again connect Segment and have it feed data into Amplitude. This links Order Completed, an event from Segment’s ecommerce spec, with the Product Purchased event in Amplitude. You could also do that for events such as Product Viewed, or Added to Cart. You’ll then be able to slice revenue data by product, product category, or SKU.

Your Visual Reference of Ecommerce Tool Integrations

The diagrams you see above come from our infographic with examples and explanations of ecommerce stacks integrated through Segment. Get your own pdf below for future use, so you can quickly scan it and remind yourself of ideas for your own stack.

Download your copy of the ecommerce stack infographic.

Join the Segment Startup Program, Build a Strong Stack, Grow Your Ecommerce Business

Segment's Startup Program is here to give early-stage startups the tools necessary to build stacks like this and thrive. Eligible startups get $25k in Segment credits for up to two years, which can be used for Segment’s Team Plan. Additionally, Segment is throwing in over $1 million in free marketing and analytics platforms like Amplitude and Amazon Web Services, on top of a number of heavy software discounts. You’ll even get access to level-up resources such as Segment’s Analytics Academy or Analytics office hours.

Eligible startups must have been incorporated less than two years ago and have raised no greater than $5 million in total funding.

Don’t wait any longer, go learn more about Segment’s one-of-a-kind Startup Program. And if you’d like a hand picking your tools along the way, feel free to use our WYSIWYG MarTech stack builder.

About the Author

Dan McGaw is the founder of McGaw.io, MarTech speaker, and co-founder of analytics tools such as UTM.io. He’s worked extensively with Segment implementations and led the creation of tools such as the Segment CSV importer.

Guest author: Dan McGaw on October 29th 2021

In my many years managing MarTech implementations, I receive two questions more than any other: 

1. Which user events are worth tracking?

2. Which marketing metrics should I collect?

The answer depends on a range of factors. First and foremost, your business model. Mobile marketers face a unique user base with a low tolerance for apps that don't meet high expectations for functionality and reliability. That's why we’re collaborating with Segment’s Startup Program to give you this take on the tools, metrics, and user events crucial for success.

The Added Value of a Solid Mobile Stack

I have a hard time imagining a successful app developer that reliably makes and scales quality apps without the feedback, performance improvements, and added functionality that tech stacks provide. Doing without means making do with guesswork. Maybe it will work, but maybe it won’t.

Mobile purchase cycles leave developers only a moment to hook users. That requires swift and prompt action to attract and retain users by intervening in crucial moments to improve their customer experience.

As a mobile app creator, you need your tech stack to capture a much wider array of user data, and respond to user activity and needs in a way that keeps them engaged, creates beneficial functionality, and fixes any technical stumbling blocks. 

Top User Events to Track

Which user events are most actionable in the analytics of mobile startups? Those that track the part of the journey from app install to first order.

  • Application Installed: Triggered after users first download your app or upon opening it via their home screen.

  • Install Attributed: Credits the right marketing channel for delivering new users.

  • User Created: This middle-funnel event occurs after users install and once they register an account with your service. It identifies active users.

  • [Feature Used]: Triggered whenever users launch a feature of your choice. This can be cross-referenced with retention data to uncover the stickiest features. 

  • Order Completed: Tracks when users make in-app purchases and contribute to your revenue. 

Top Mobile Metrics—How a Stack Helps Measure What Matters

Tracking the key mobile events will generate actionable metrics for you. To spend your effort where it can make a difference in a predictable way, focus on the revenue metric. Then break it down to the steps that bring your users closer to revenue.

Revenue

Track revenue just the way you need—subscription-based or ecommerce. Slice it by DAU (daily average users) or MAU (monthly average users), or marketing channel. The Order Completed event is what connects the dots for you here.

Install to Signup Rate

The install-to-signup-rate metric, also called the activation rate, puts your app's onboarding process under a microscope. It helps you answer the question “How many of the users who install the app actually start using it, too?” The magic is done by relating the Application Installed and User Created events.

Your job is to find where and how users get confused or otherwise discouraged before registration. You may also want to look into which channel or campaign brought the engaged users.

Signup to Pay Rate

This metric picks up where our last metric left off, answering “How often do new users become paying customers?” It extrapolates the User Created and Order Completed events. By extension, it tells you about how well your user activation strategy is doing.

Compare across marketing channels to identify the most reliable channels for creating real-deal customers. Get granular and analyze cohorts based on their install date or app activity. Or calculate the average cost of converting a visitor to a paid user.

New, Retained & Churned Active Users

These metrics connect retention to user activity, and answer questions such as: “Which features make the users stick?” or “Which features make the users churn?” It does so by relating the Install Attributed and [Feature Used] events to data about DAU. 

Retention and revenue are two sides of the same coin. Get the data that’ll enable you to retain users better, and it’ll make a world of difference to the bottom line.

Use Cases of Integrations for Mobile Growth

The applications you choose for your stack are important, second only to the quality of integrations that hold your stack together. After all, a good stack gives you new tools and expanded capabilities to improve the customer experience.  That’s how you use a stack to improve revenue. Let’s walk through a few examples.

Add Email Events to Customer Journeys

Free the data that would otherwise be siloed in Braze or another messaging platform. You'll use Segment to help pass data about user activity that occurs off app—in email and text. 

Now, events like Email Delivered will flow down to your analytics platform, Amplitude, which can be combined with data from other marketing channels and app activity. The email events will also be sent to your data warehouse, BigQuery, for additional analysis and backup.

The combined data stream also helps you track the full customer journey. The full set of touch points you create with users—whether it’s to engage existing ones, or to convert new ones. This will help you improve and grow your email flows.

Use App Usage Data in Marketing Attribution

Tie in-app user engagement data with your marketing attribution. In the above example, you can do so by extracting the Install Attributed event, created by Appsflyer. Then you’ll translate it in Segment and pass it on to your attribution reports in Amplitude.

Your marketing reporting will level up, letting you compare user engagement across marketing channels or campaigns. You will also put yourself in the position to reveal differences in purchasing behavior and retention.

Load and Model Ad Spend

Get Facebook Ads and Google Ads data to play nice together. The ad spend numbers get piped through Segment, then flow into your visualization and modeling platforms such as Chartio or POPSQL. You can then model and optimize CAC (customer acquisition cost). The two ad platforms will get a fair comparison

As you’ll often want to do, you can also send the data to BigQuery, where it can be processed further.

Personalize and Automate Messaging

Make your messaging matter. So much so that it will be relevant to the user’s location and app usage.

In this use case, Radar collects location-specific events, and connects it with the user’s history. But the data is only valuable when integrated with a platform that acts on it. So you can use Segment to send the location data and user traits to Braze. As a result, your messaging with the users can be both personalized and automated. That’s custom messaging at its best.

You’ll enable interactions such as location-specific deals, geofencing, localized inventories, geotargeting, or store locators.

Your Visual Reference of Ecommerce Tool Integrations

The diagrams we're using in this post come from our infographic, which you can download below for future use.

Download your copy of the mobile stack infographic.

Join the Segment Startup Program, Build a Strong Stack, Grow Your Mobile Business

Our Startup Program is here to give early-stage startups the tools necessary to build stacks like this and thrive. Eligible startups get $25k in Segment credits for up to two years, using Segment’s Team Plan. Additionally, Segment is throwing in over $1 million in free marketing and analytics platforms like Amplitude and Amazon Web Services, on top of a number of heavy software discounts. You’ll even get access to level-up resources such as Segment’s Analytics Academy or Analytics office hours.

Eligible startups must have been incorporated less than two years ago and have raised no greater than $5 million in total funding.

Don’t wait any longer. Go learn more about Segment’s one-of-a-kind Startup Program. And if you’d like a hand picking your tools along the way, feel free to use our WYSIWYG MarTech stack builder.

Learn more about the Segment Startup Program.

About the Author

Dan McGaw is the founder of McGaw.io, MarTech speaker, and co-founder of analytics tools such as UTM.io. He’s worked extensively with Segment implementations and led the creation of tools such as the Segment CSV importer.

Geoffrey Keating on October 28th 2021

Digital acceleration means data migration is inevitable. Learn all about data migration, important things to consider, and the basic process.

Guest author: Dan McGaw on October 28th 2021

As a growth hacker, MarTech founder and implementation consultant, I often get the following questions:

  • Which user events are worth tracking?

  • Which metrics should I use for decision making?

  • How should I build and use my stack to scale?

The answer is different for every business. But for B2C subscription-based business, they need a responsive stack that draws high volumes of prospects and reliably pulls them through their conversion funnels with as little drop-off as possible. 

That's why we’re collaborating with Segment’s Startup Program to give you this take on the tools, metrics, and user events crucial for success.

The Added Value of a Solid B2C Subscription Stack

Getting B2C marketing right means aligning your team members with their responsibility for the purchase cycle, helping them identify and fast-track high-value leads, and ensuring smooth transitions between funnel stages and business functions. 

I have a hard time imagining a B2C subscription company thriving without the responsive infrastructure, digestible user feedback,  event-based data, and added functionality a well-implemented stack brings.

Top B2C Events Your Stack Should Track 

Your stack’s toolset is only as good as its implementation. Also, a good stack gives you new tools and expanded capabilities to improve the customer experience. That’s how you use a stack to improve revenue. Let’s walk through a few examples of the user events that help you get there.

  • Lead Created: Unknown users trigger the Lead Created event by submitting their contact information (and usually other traits).

  • User Created: After leads or visitors create their first login, the User Created event occurs. This means the user has started using your application properly, and is closer to spending money in case that hasn’t happened yet.

  • [Feature Used]: This custom, product-based event can help you determine the relationship between key feature usage and long-term customer value, in addition to retention. Find ways to encourage repeat usage to optimize. Examples include Song Played, Lesson Completed, or Note Created.

  • Order Completed: Users have completed the customer journey by the time they reach Order Completed. This event is often used to track retention, revenue, and conversion rates from free to paid users.

Top B2C Metrics: How a Stack Helps Measure What Matters

Marketers in B2C subscription businesses need a broad view of the customer purchase cycle, with the ability to get granular and tinker with each conversion step. Use metrics to evaluate audiences and identify the best channels for high-volume conversion. 

Visitor-to-Signup Conversion Rate

This metric combines page views with the User Created event and helps users compare marketing channels to identify key audiences. Find a way to move this one a little, and you can see substantial gains in revenue. 

Trial Subscription Conversion Rate

Understand how frequently your visitors become users after sampling your services. By cross-referencing page views and the User Created event, you open up further examination of marketing channels and their varying ability to convert visitors.

New and Total Monthly Recurring Revenue (MRR)

Once users trigger Order Completed, they'll find themselves in this camp, where new and total revenue are tracked. Identify best-contributing channels and product features. Switch between MRR and ARR as you need.

Monthly Churn

Retention is no different from revenue in the subscription model. With the help of a product analytics platform such as Amplitude and the Order Completed event, users can slice monthly revenue by overall retention, engagement, and feature usage to shed light on the levers that make your customer experience stickier. For early-stage startups, the retention may be even more important than revenue.

Use Cases of Integrations for B2C Growth

When you integrate numerous MarTech tools, you not only pool resources, you expand each tool’s functional capacity. Making good use of the ecosystem can help you scale B2C revenue. Below are examples for your inspiration.

Fuel and Automate Personalized Messaging

One of the best features Segment brings to stacks: allowing the same tool to send and receive data. Where other use cases such as the next one in this article see Customer.io sending event signals upstream to Segment, in this instance, we have a custom JavaScript (JS) data source pushing event and user traits to Customer.io.

And since Segment can send JS data to multiple communication platforms at once, we're delivering the same user data to Autopilot and Drift. With Drift, event data can trigger automated and personalized chat and text experiences (designed to qualify and convert users). Once it gets to Autopilot, it's combined with other customer data sources for personalized lead nurturing email campaigns (which are also automated).

Add Customer.io Email Events

By intervening in crucial (context-specific) moments in the customer journey, marketers can improve the customer experience—increasing engagement and avoiding churn.  However, event-based data from communications platforms like Customer.io are frequently disconnected from analytics platforms and marketing automations.

Segment connects event-based triggers from Customer.io with our product analytics platform Amplitude and our data warehouse BigQuery. Now events like Email Clicked can trigger any number of actions, like personalized purchase incentives or assigning users to various cohorts based on their demonstrated level of interest.

Report on JavaScript Data Sources

Custom attribution models use JS to send the data such as Page and Identity tables. But such custom data sources can be hard to sync with your reporting and data exploration tools such as Chartio.

With Segment, you can translate data and consolidate taxonomy, so they live alongside data from other sources in your warehouse and reporting tools. As a result, you can build accurate attribution and calculate your ROAS.

Load Ad Spend & Compare Ad Inventory Performance

Facebook Ads and Google Ads don't play well together, which means you have to be creative in your reporting if you want to show the full picture. Segment comes to the rescue once again. Both ad inventory sources can be loaded into BigQuery. There, you’ll combine the ad spend data with web activity data that’s also piped through Segment.

The infrastructure will unlock new analytics possibilities. You’ll be able to push the combined data into visualization and modeling platforms like Chartio or POPSQL, and build models on key metrics such as Customer Acquisition Cost (CAC).

Your Visual Reference of B2C Subscription Tool Integrations

The diagrams you see above come from our infographic with examples and explanations of B2C subscription stacks integrated through Segment. Get your own pdf below for future use, so you can quickly scan it and remind yourself of ideas for your own stack.

Download your copy of the ecommerce stack infographic.

Join the Segment Startup Program, Build a Strong Stack, Grow Your B2C Subscription Business

Segment's Startup Program is here to give early-stage startups the tools necessary to build stacks like this and thrive. Eligible startups get $25k in Segment credits for up to two years, which can be used for Segment’s Team Plan.

Additionally, Segment is throwing in over $1 million in free marketing and analytics platforms like Amplitude and Amazon Web Services, on top of a number of heavy software discounts. You’ll even get access to level-up resources such as Segment’s Analytics Academy or Analytics office hours.

Eligible startups must have been incorporated less than two years ago and have raised no greater than $5 million in total funding.

Don’t wait any longer. Learn more about Segment’s one-of-a-kind Startup Program. And if you’d like a hand picking your tools along the way, feel free to use our WYSIWYG MarTech stack builder.

About the Author

Dan McGaw is the founder of McGaw.io, MarTech speaker, and co-founder of analytics tools such as UTM.io. He’s worked extensively with Segment implementations and led the creation of tools such as the Segment CSV importer.

Jim Young on October 28th 2021

To scale Growth, you need to define its purpose, set goals, structure collaboration, and master your customer data. Three experts share their insights.

Benjamin Yolken on October 26th 2021

At Segment, we use Apache Kafka extensively to store customer events and connect the key pieces of our data processing pipeline. Last year we open-sourced topicctl, a tool that we developed for safer and easier management of the topics in our Kafka clusters; see our previous blog post for more details.

Since the initial release of topicctl, we’ve been working on several enhancements to the tool, with a particular focus on removing its dependencies on the Apache ZooKeeper APIs; as described more below, this is needed for a future world in which Kafka runs without ZooKeeper. We’ve also added authentication on broker API calls and fixed a number of user-reported bugs.

After months of internal testing, we’re pleased to announce that the new version, which we’re referring to as “v1”, is now ready for general use! See the repo README for more details on installing and using the latest version.

In the remainder of this post, we’d like to go into more detail on these changes and explain some of the technical challenges we faced in the process.

Kafka, ZooKeeper, and topicctl

A Kafka cluster consists of one or more brokers (i.e. nodes), which expose a set of APIs that allow clients to read and write data, among other use cases. The brokers coordinate to ensure that each has the latest version of the configuration metadata, that there is a single, agreed-upon leader for each partition, that messages are replicated to the right locations, and so forth.

Original architecture

Historically, the coordination described above has been done via a distributed key-value store called Apache ZooKeeper. The latter system stores shared metadata about everything in the cluster (brokers, topics, partitions, etc.) and has primitives to support coordination activities like leader election.

ZooKeeper was not just used internally by Kafka, but also externally by clients as an interface for interacting with cluster metadata. To fetch all topics in the cluster, for instance, a client would hit the ZooKeeper API and read the keys and values in a particular place in the ZooKeeper data hierarchy. Similarly, updates to metadata, e.g. changing the brokers assigned to each partition in a topic, were done by writing JSON blobs into ZooKeeper with the expected format in the expected place.

Some of these operations could also be done through the broker APIs, but many could only be done via ZooKeeper.

Given these conditions, we decided to use ZooKeeper APIs extensively in the original version of topicctl. Although it might have been possible to provide some subset of functionality without going through ZooKeeper, this “mixed access” mode would have made the code significantly more complex and made troubleshooting connection issues harder because different operations would be talking to different systems.

Towards a ZooKeeper-less World

In 2019, a proposal was made to remove the ZooKeeper dependency from Kafka. This would require handling all coordination activities internally within the cluster (involving some significant architectural changes) and also adding new APIs so that clients would no longer need to hit ZooKeeper for any administrative operations.

The motivation behind this proposal was pretty straightforward — ZooKeeper is a robust system and generally works well for the coordination use cases of Kafka, but can be complex to set up and manage. Removing it would significantly simplify the Kafka architecture and improve its scalability.

This proposal was on our radar when we originally created topicctl, but the implementation was so far off in the future that we weren’t worried about it interfering with our initial release. Recently, however, the first Kafka version that can run without ZooKeeper landed. We realized that we needed to embrace this new world so the tool would work continue to work with newer Kafka versions.

At the same time, we got feedback both internally and externally that depending on ZooKeeper APIs for the tool would make security significantly harder. ZooKeeper does have its own ACL system, but managing this in parallel with the Kafka one is a pain, so many companies just block ZooKeeper API access completely for everything except the Kafka brokers. Many users would be reluctant to open this access up (rightfully so!) and thus the ZooKeeper requirement was blocking the adoption of the tool in many environments.

Given these multiple factors, removing the ZooKeeper requirement from topicctl became a high priority.

Removing the ZooKeeper requirement

In the original code for topicctl, all cluster admin access went through a single struct type, the admin client, which then used a private ZooKeeper client for fetching configs, updating topics, etc. This struct exposed methods that could be called by other parts of the tool; the golang code for triggering a leader election in the cluster, for instance, looked like the following (some details omitted for simplicity):

Note that the client in this case isn’t actually communicating with the Kafka brokers or using any Kafka APIs. It’s just writing some JSON into /admin/preferred_replica_election, which, by convention, is where the Kafka brokers will look to start the process of running a leader election.

Our first step was to convert the APIs exposed by this struct into a golang interface with two implementations- one that depended on ZooKeeper, i.e. using the code from our original version of the admin client, and a second that only used Kafka broker APIs. 

So, the Client above became:

with the RunLeaderElection implementations becoming the following for the ZooKeeper and ZooKeeper-less versions, respectively:

The next step was to fill out the details of the broker-based admin client so that it actually worked. topicctl was already using the excellent kafka-go library for its functionality that depended on broker APIs (e.g., tailing topics), so we wanted to use that here as well. Unfortunately, however, this library was designed primarily for reading and writing data, as opposed to metadata, so it only supported a subset of the admin-related Kafka API.

After doing an inventory of our client’s requirements, we determined that there were six API calls we needed that were not yet supported by kafka-go:

Our next step was to update kafka-go to support these! At first, it looked easy- this library already had a nice interface for adding new Kafka APIs; all you had to do was create go structs to match the API message specs, and then add some helper functions to do the calls.

But, as often happens, we ran into a wrinkle: a new variant of the Kafka protocol had been recently introduced (described here) to make API messages more space-efficient. Although most of the APIs we needed had versions predating the update, a few only supported the new protocol format. To add all of the APIs we needed, we’d have to update kafka-go to support the new format.

Thus, we first went through all of the protocol code in kafka-go, updating it to support both the old and new formats. The proposal linked above didn’t have 100% of the details we needed, so in several cases, we also had to consult the Kafka code to fully understand how newer messages were formatted. After much trial and error, we eventually got this code working and merged.

Once that was done, we were unblocked from adding the additional APIs, which we did in this change. Finally, we could go back to the topicctl code and fill out the implementation of the broker-based admin client.

Returning to the RunLeaderElection example from above, we now had something like:

The end result is that we were able to get topicctl working end-to-end with either the ZooKeeper-based implementation (required for older clusters) or the ZooKeeper-less one (for newer clusters), with only minimal changes in the other parts of the code.

Security updates

In addition to removing the ZooKeeper requirement from topicctl , we also got several requests to support secure communication between the tool and the brokers in a cluster. We didn’t include these in the original version because we don’t (yet) depend on these features internally at Segment; but, they’re becoming increasingly important, particularly as users adopt externally hosted Kafka solutions like AWS MSK and Confluent Cloud.

We went ahead and fixed this, at least for the most common security mechanisms that Kafka supports. First, and most significantly, topicctl can now use TLS (called “SSL” for historical reasons in the Kafka documentation) to encrypt all communication between the tool and the brokers. 

In addition to TLS, we also added support for SASL authentication on these links. This provides a secure way for a client to present a username and password to the API; the permissions for each authenticated user can then be controlled in a fine-grained way via Kafka’s authorization settings.

Testing and release

As we updated the internals of topicctl, we extended our unit tests to run through the core flows like applying a topic change under multiple conditions, e.g. using ZooKeeper vs. only using Kafka APIs. We also used docker-compose to create local clusters with different combinations of Kafka versions, security settings, and client settings to ensure that the tool worked as expected in all cases. 

Once this initial testing was done, we updated the internal tooling that Segment engineers use to run topicctl to use either the old version or the new one, depending on the cluster. In this way, we could roll out to newer, lower-risk clusters first, then eventually work up to the bigger, riskier ones. 

After several months of usage, we felt confident enough to use v1 for all of our clusters and deprecate the old version for both internal and external users of the tool.

Conclusion

topicctl v1 is ready for general use! You might find it a useful addition to your Kafka toolkit for understanding the data and metadata in your clusters, and for making config changes. Also, feel free to create issues in our Github repository to report problems or request features for future versions.

Kelly Kirwan on October 26th 2021

Big data is a big deal to manage. And data integrity is vital for business growth and customer trust. Learn what it is, why it’s crucial, and how to ensure it.

Jim Young on October 22nd 2021

From October 20-21, Segment joined SIGNAL, Twilio’s annual customer and developer conference, alongside 50,000+ developers, product leaders, enterprises, and startups.

Geoffrey Keating on November 10th 2021

Twilio Segment releases its 2021 Growth Report, an exclusive view into how customer data fuels today’s most high-performing businesses.

Guest author: Dan McGaw on November 1st 2021

I've watched thousands of startups build, grow, and optimize their tech stacks. Throughout the years, and while implementing Segment, teams have asked me the same questions time and time again.

  1. Which events should we track?

  2. What about metrics and customer data?

The answers are consequential; I've seen too many promising startups with great products fail due to fixable logistical issues and insufficient resources. That's why we’re collaborating with Segment’s Startup Program to give you this take on the tools, metrics, and user events crucial for success: this time for ecommerce companies.

The Added Value of a Solid Ecommerce Stack 

The nature of ecommerce necessitates a speedy tech stack to seamlessly collect user data, transmit it to the relevant destination, and respond to customer activity in real-time. The complete customer journey and closed reporting loop that will provide a stable framework from which you can scale your business.

As your business grows, you'll use your tech stack to further refine and optimize your practices, now armed with historical data that covers the full customer journey. From here, you'll find and grow the lifetime value of your customers, allowing you to optimize your customer acquisition for actual ROI and ROAS. You’ll also have what it takes to improve drop-off rates, or automate communication throughout the touchpoints.

Ultimately, you’ll be able to snowball the revenue.

Ecommerce Events to Track

So, which events are most relevant to ecommerce startups? 

You want to focus on the customer journey and cover its full range. Cover the various stages in the purchase flow. Pay attention to the following interactions:

  • Product Viewed: Occurs when a user visits a product page.

  • Product Added: Whenever a visitor adds a product to their shopping cart.

  • Checkout Started: Once a visitor adds a product and clicks the “checkout” button.

  • Checkout Step Completed: After a visitor adds a product, clicks the “checkout” button, enters their payment information, and moves onto the final stage of checkout..

  • Order Completed: Once a visitor completes entering all their information for purchase and reaches the thank you screen.

Top Ecommerce Metrics: How a Stack Helps Measure What Matters

As a part of transactional sites, e-commerce pages are focused on converting visitors into customers and first-time customers into repeat customers. The metrics most valuable to you will be those that uncover subtle shopping behavior, connecting your customer's average purchase spend with their repurchase rate, lifetime value, and finally, to your sales totals.

Visit to Order Rate

The percentage of visits that convert into orders. The metric helps you answer “What makes the customers purchase the first time?”

Average Order Value (AOV)

The averaged total value of every order placed on a store over a specified period. The metric helps you answer “What makes the customers spend more?” or “Which campaign leads to the highest purchases?”

Total Revenue & Orders 

Shows the revenue you've earned across all channels and orders placed. This is often the KPI for ecommerce marketing. The metric helps you answer “Which products are our 80-20?” or “What are the main movers of our top-line revenue?”.

Customer Lifetime Value (CVL)

The total value in dollars the customer will contribute over a lifetime. Lifetime is usually calculated between 12 and 24 months. The metric helps you answer “What is the real value of the customer we acquired?”

CLV takes reporting from immediate to real value. Lifetime value is often a lot higher than immediate value. Also, when one channel, campaign or product stands out in immediate value, it doesn’t necessarily stand out in lifetime value.

Repurchase Rate & Frequency

Another valuable metric that measures repeat purchases. Repurchase rate and frequency are calculated by dividing the number of customers who made at least two purchases in a given timeframe by the total customer count. 

One of your most actionable metrics in ecommerce is your repeat purchase rate. You can measure it with the help of your order completed event, and tying it to the customer whose journey you follow. Then you’ll build out a funnel to see what channels and products drive repeat purchases. So the metric will help answer “What makes our customers come back?”

Use Cases of Integrations for Ecommerce Growth

By combining the power and utility of different MarTech tools, you expand the capabilities of each. That’s how you use a stack to improve or scale ecommerce revenue. Let’s walk through examples.

Add Customer.io Email Events to Customer Journeys

Email events from platforms such as Customer.io often do not make it to user analytics or data warehouses. That’s a huge missed opportunity for personalization or touchpoint automation. Segment translates  event data from your email platform and passes it on to tools such as BigQuery and Amplitude.

You can then follow the full customer journey, optimize based on events such as Email Delivered / Opened / Clicked / Unsubscribe, or create complete reporting about your email flows.

Report on Javascript Data Sources

Javascript is the standard language for analytics. Along with the biggest tools such as Google Analytics or Tag Manager, and many others, you’ll see javascript in custom attribution models. The models often populate data that’s hard to integrate exactly because it is custom, and doesn’t always follow the formats required by your favorite reporting tools. This can make reporting much less insightful.

With Segment, you can translate your custom attribution data and taxonomy into a format that’s understood by a data warehouse such as BigQuery. From there, you can easily send it to a reporting tool such as Chartio or PopSQL. Think data such as page and identity tables, the common attribution data points. You’ll be on your way to a closed reporting loop, with accurate and complete ROAS insights.

Calculate the Full Customer Acquisition Cost

Just like with the above-mentioned ability to translate custom javascript to the rest of your MarTech stack, Segment plays the Rosettta Stone of APIs for your ad platforms. Segment will pull data from Facebook Ads, Google Ads, and other ad marketplaces, then push it into your BigQuery data warehouse. You’ll be able to get the full picture of how the different ads push the customers down the funnel, and you can also add data about interactions such as checkout events.

As a result, you’ll be able to send complete acquisition data into a reporting tool such as Chartio or PopSQL. There, you can, for example, build models for calculating key metrics such as CAC (Customer Acquisition Cost).

Power and Automate Custom Messaging

In MarTech stacks based on Segment, you can have the same tool receive or send data. Customer.io is an example. Above, we had an example of Customer.io sitting upstream, so it could send data for use in customer journey analytics. Here, Customer.io sits downstream, receiving data from a javascript source.

Segment can even send your JS data to multiple communication platforms at a time. So e.g. Customer.io for emails and Drift for chat and text messages. This way, all of your messaging can be both custom and automated based on events or custom user traits. User events like shopping cart abandonment, newsletters signups, and purchases can help segment the customers. You’ll be able to automatically trigger personalized email sends, push notifications, or SMS messages. Or you’ll be able to make your chat bot much more personal and useful.

Follow Product-level Activity

Amplitude is a popular analytics platform that's grown in popularity thanks to its depth, predictive and personalization capabilities, and automated optimization features. When enriched with clean, normalized data, users can optimize across the entire customer journey, improve user acquisition, uncover purchase behavior patterns, and connect product-level data with revenue.

However, imagine your user data is tied up in a JS source such as Google Analytics. So we’ll once again connect Segment and have it feed data into Amplitude. This links Order Completed, an event from Segment’s ecommerce spec, with the Product Purchased event in Amplitude. You could also do that for events such as Product Viewed, or Added to Cart. You’ll then be able to slice revenue data by product, product category, or SKU.

Your Visual Reference of Ecommerce Tool Integrations

The diagrams you see above come from our infographic with examples and explanations of ecommerce stacks integrated through Segment. Get your own pdf below for future use, so you can quickly scan it and remind yourself of ideas for your own stack.

Download your copy of the ecommerce stack infographic.

Join the Segment Startup Program, Build a Strong Stack, Grow Your Ecommerce Business

Segment's Startup Program is here to give early-stage startups the tools necessary to build stacks like this and thrive. Eligible startups get $25k in Segment credits for up to two years, which can be used for Segment’s Team Plan. Additionally, Segment is throwing in over $1 million in free marketing and analytics platforms like Amplitude and Amazon Web Services, on top of a number of heavy software discounts. You’ll even get access to level-up resources such as Segment’s Analytics Academy or Analytics office hours.

Eligible startups must have been incorporated less than two years ago and have raised no greater than $5 million in total funding.

Don’t wait any longer, go learn more about Segment’s one-of-a-kind Startup Program. And if you’d like a hand picking your tools along the way, feel free to use our WYSIWYG MarTech stack builder.

About the Author

Dan McGaw is the founder of McGaw.io, MarTech speaker, and co-founder of analytics tools such as UTM.io. He’s worked extensively with Segment implementations and led the creation of tools such as the Segment CSV importer.

Guest author: Dan McGaw on October 29th 2021

In my many years managing MarTech implementations, I receive two questions more than any other: 

1. Which user events are worth tracking?

2. Which marketing metrics should I collect?

The answer depends on a range of factors. First and foremost, your business model. Mobile marketers face a unique user base with a low tolerance for apps that don't meet high expectations for functionality and reliability. That's why we’re collaborating with Segment’s Startup Program to give you this take on the tools, metrics, and user events crucial for success.

The Added Value of a Solid Mobile Stack

I have a hard time imagining a successful app developer that reliably makes and scales quality apps without the feedback, performance improvements, and added functionality that tech stacks provide. Doing without means making do with guesswork. Maybe it will work, but maybe it won’t.

Mobile purchase cycles leave developers only a moment to hook users. That requires swift and prompt action to attract and retain users by intervening in crucial moments to improve their customer experience.

As a mobile app creator, you need your tech stack to capture a much wider array of user data, and respond to user activity and needs in a way that keeps them engaged, creates beneficial functionality, and fixes any technical stumbling blocks. 

Top User Events to Track

Which user events are most actionable in the analytics of mobile startups? Those that track the part of the journey from app install to first order.

  • Application Installed: Triggered after users first download your app or upon opening it via their home screen.

  • Install Attributed: Credits the right marketing channel for delivering new users.

  • User Created: This middle-funnel event occurs after users install and once they register an account with your service. It identifies active users.

  • [Feature Used]: Triggered whenever users launch a feature of your choice. This can be cross-referenced with retention data to uncover the stickiest features. 

  • Order Completed: Tracks when users make in-app purchases and contribute to your revenue. 

Top Mobile Metrics—How a Stack Helps Measure What Matters

Tracking the key mobile events will generate actionable metrics for you. To spend your effort where it can make a difference in a predictable way, focus on the revenue metric. Then break it down to the steps that bring your users closer to revenue.

Revenue

Track revenue just the way you need—subscription-based or ecommerce. Slice it by DAU (daily average users) or MAU (monthly average users), or marketing channel. The Order Completed event is what connects the dots for you here.

Install to Signup Rate

The install-to-signup-rate metric, also called the activation rate, puts your app's onboarding process under a microscope. It helps you answer the question “How many of the users who install the app actually start using it, too?” The magic is done by relating the Application Installed and User Created events.

Your job is to find where and how users get confused or otherwise discouraged before registration. You may also want to look into which channel or campaign brought the engaged users.

Signup to Pay Rate

This metric picks up where our last metric left off, answering “How often do new users become paying customers?” It extrapolates the User Created and Order Completed events. By extension, it tells you about how well your user activation strategy is doing.

Compare across marketing channels to identify the most reliable channels for creating real-deal customers. Get granular and analyze cohorts based on their install date or app activity. Or calculate the average cost of converting a visitor to a paid user.

New, Retained & Churned Active Users

These metrics connect retention to user activity, and answer questions such as: “Which features make the users stick?” or “Which features make the users churn?” It does so by relating the Install Attributed and [Feature Used] events to data about DAU. 

Retention and revenue are two sides of the same coin. Get the data that’ll enable you to retain users better, and it’ll make a world of difference to the bottom line.

Use Cases of Integrations for Mobile Growth

The applications you choose for your stack are important, second only to the quality of integrations that hold your stack together. After all, a good stack gives you new tools and expanded capabilities to improve the customer experience.  That’s how you use a stack to improve revenue. Let’s walk through a few examples.

Add Email Events to Customer Journeys

Free the data that would otherwise be siloed in Braze or another messaging platform. You'll use Segment to help pass data about user activity that occurs off app—in email and text. 

Now, events like Email Delivered will flow down to your analytics platform, Amplitude, which can be combined with data from other marketing channels and app activity. The email events will also be sent to your data warehouse, BigQuery, for additional analysis and backup.

The combined data stream also helps you track the full customer journey. The full set of touch points you create with users—whether it’s to engage existing ones, or to convert new ones. This will help you improve and grow your email flows.

Use App Usage Data in Marketing Attribution

Tie in-app user engagement data with your marketing attribution. In the above example, you can do so by extracting the Install Attributed event, created by Appsflyer. Then you’ll translate it in Segment and pass it on to your attribution reports in Amplitude.

Your marketing reporting will level up, letting you compare user engagement across marketing channels or campaigns. You will also put yourself in the position to reveal differences in purchasing behavior and retention.

Load and Model Ad Spend

Get Facebook Ads and Google Ads data to play nice together. The ad spend numbers get piped through Segment, then flow into your visualization and modeling platforms such as Chartio or POPSQL. You can then model and optimize CAC (customer acquisition cost). The two ad platforms will get a fair comparison

As you’ll often want to do, you can also send the data to BigQuery, where it can be processed further.

Personalize and Automate Messaging

Make your messaging matter. So much so that it will be relevant to the user’s location and app usage.

In this use case, Radar collects location-specific events, and connects it with the user’s history. But the data is only valuable when integrated with a platform that acts on it. So you can use Segment to send the location data and user traits to Braze. As a result, your messaging with the users can be both personalized and automated. That’s custom messaging at its best.

You’ll enable interactions such as location-specific deals, geofencing, localized inventories, geotargeting, or store locators.

Your Visual Reference of Ecommerce Tool Integrations

The diagrams we're using in this post come from our infographic, which you can download below for future use.

Download your copy of the mobile stack infographic.

Join the Segment Startup Program, Build a Strong Stack, Grow Your Mobile Business

Our Startup Program is here to give early-stage startups the tools necessary to build stacks like this and thrive. Eligible startups get $25k in Segment credits for up to two years, using Segment’s Team Plan. Additionally, Segment is throwing in over $1 million in free marketing and analytics platforms like Amplitude and Amazon Web Services, on top of a number of heavy software discounts. You’ll even get access to level-up resources such as Segment’s Analytics Academy or Analytics office hours.

Eligible startups must have been incorporated less than two years ago and have raised no greater than $5 million in total funding.

Don’t wait any longer. Go learn more about Segment’s one-of-a-kind Startup Program. And if you’d like a hand picking your tools along the way, feel free to use our WYSIWYG MarTech stack builder.

Learn more about the Segment Startup Program.

About the Author

Dan McGaw is the founder of McGaw.io, MarTech speaker, and co-founder of analytics tools such as UTM.io. He’s worked extensively with Segment implementations and led the creation of tools such as the Segment CSV importer.

Geoffrey Keating on October 28th 2021

Digital acceleration means data migration is inevitable. Learn all about data migration, important things to consider, and the basic process.

Guest author: Dan McGaw on October 28th 2021

As a growth hacker, MarTech founder and implementation consultant, I often get the following questions:

  • Which user events are worth tracking?

  • Which metrics should I use for decision making?

  • How should I build and use my stack to scale?

The answer is different for every business. But for B2C subscription-based business, they need a responsive stack that draws high volumes of prospects and reliably pulls them through their conversion funnels with as little drop-off as possible. 

That's why we’re collaborating with Segment’s Startup Program to give you this take on the tools, metrics, and user events crucial for success.

The Added Value of a Solid B2C Subscription Stack

Getting B2C marketing right means aligning your team members with their responsibility for the purchase cycle, helping them identify and fast-track high-value leads, and ensuring smooth transitions between funnel stages and business functions. 

I have a hard time imagining a B2C subscription company thriving without the responsive infrastructure, digestible user feedback,  event-based data, and added functionality a well-implemented stack brings.

Top B2C Events Your Stack Should Track 

Your stack’s toolset is only as good as its implementation. Also, a good stack gives you new tools and expanded capabilities to improve the customer experience. That’s how you use a stack to improve revenue. Let’s walk through a few examples of the user events that help you get there.

  • Lead Created: Unknown users trigger the Lead Created event by submitting their contact information (and usually other traits).

  • User Created: After leads or visitors create their first login, the User Created event occurs. This means the user has started using your application properly, and is closer to spending money in case that hasn’t happened yet.

  • [Feature Used]: This custom, product-based event can help you determine the relationship between key feature usage and long-term customer value, in addition to retention. Find ways to encourage repeat usage to optimize. Examples include Song Played, Lesson Completed, or Note Created.

  • Order Completed: Users have completed the customer journey by the time they reach Order Completed. This event is often used to track retention, revenue, and conversion rates from free to paid users.

Top B2C Metrics: How a Stack Helps Measure What Matters

Marketers in B2C subscription businesses need a broad view of the customer purchase cycle, with the ability to get granular and tinker with each conversion step. Use metrics to evaluate audiences and identify the best channels for high-volume conversion. 

Visitor-to-Signup Conversion Rate

This metric combines page views with the User Created event and helps users compare marketing channels to identify key audiences. Find a way to move this one a little, and you can see substantial gains in revenue. 

Trial Subscription Conversion Rate

Understand how frequently your visitors become users after sampling your services. By cross-referencing page views and the User Created event, you open up further examination of marketing channels and their varying ability to convert visitors.

New and Total Monthly Recurring Revenue (MRR)

Once users trigger Order Completed, they'll find themselves in this camp, where new and total revenue are tracked. Identify best-contributing channels and product features. Switch between MRR and ARR as you need.

Monthly Churn

Retention is no different from revenue in the subscription model. With the help of a product analytics platform such as Amplitude and the Order Completed event, users can slice monthly revenue by overall retention, engagement, and feature usage to shed light on the levers that make your customer experience stickier. For early-stage startups, the retention may be even more important than revenue.

Use Cases of Integrations for B2C Growth

When you integrate numerous MarTech tools, you not only pool resources, you expand each tool’s functional capacity. Making good use of the ecosystem can help you scale B2C revenue. Below are examples for your inspiration.

Fuel and Automate Personalized Messaging

One of the best features Segment brings to stacks: allowing the same tool to send and receive data. Where other use cases such as the next one in this article see Customer.io sending event signals upstream to Segment, in this instance, we have a custom JavaScript (JS) data source pushing event and user traits to Customer.io.

And since Segment can send JS data to multiple communication platforms at once, we're delivering the same user data to Autopilot and Drift. With Drift, event data can trigger automated and personalized chat and text experiences (designed to qualify and convert users). Once it gets to Autopilot, it's combined with other customer data sources for personalized lead nurturing email campaigns (which are also automated).

Add Customer.io Email Events

By intervening in crucial (context-specific) moments in the customer journey, marketers can improve the customer experience—increasing engagement and avoiding churn.  However, event-based data from communications platforms like Customer.io are frequently disconnected from analytics platforms and marketing automations.

Segment connects event-based triggers from Customer.io with our product analytics platform Amplitude and our data warehouse BigQuery. Now events like Email Clicked can trigger any number of actions, like personalized purchase incentives or assigning users to various cohorts based on their demonstrated level of interest.

Report on JavaScript Data Sources

Custom attribution models use JS to send the data such as Page and Identity tables. But such custom data sources can be hard to sync with your reporting and data exploration tools such as Chartio.

With Segment, you can translate data and consolidate taxonomy, so they live alongside data from other sources in your warehouse and reporting tools. As a result, you can build accurate attribution and calculate your ROAS.

Load Ad Spend & Compare Ad Inventory Performance

Facebook Ads and Google Ads don't play well together, which means you have to be creative in your reporting if you want to show the full picture. Segment comes to the rescue once again. Both ad inventory sources can be loaded into BigQuery. There, you’ll combine the ad spend data with web activity data that’s also piped through Segment.

The infrastructure will unlock new analytics possibilities. You’ll be able to push the combined data into visualization and modeling platforms like Chartio or POPSQL, and build models on key metrics such as Customer Acquisition Cost (CAC).

Your Visual Reference of B2C Subscription Tool Integrations

The diagrams you see above come from our infographic with examples and explanations of B2C subscription stacks integrated through Segment. Get your own pdf below for future use, so you can quickly scan it and remind yourself of ideas for your own stack.

Download your copy of the ecommerce stack infographic.

Join the Segment Startup Program, Build a Strong Stack, Grow Your B2C Subscription Business

Segment's Startup Program is here to give early-stage startups the tools necessary to build stacks like this and thrive. Eligible startups get $25k in Segment credits for up to two years, which can be used for Segment’s Team Plan.

Additionally, Segment is throwing in over $1 million in free marketing and analytics platforms like Amplitude and Amazon Web Services, on top of a number of heavy software discounts. You’ll even get access to level-up resources such as Segment’s Analytics Academy or Analytics office hours.

Eligible startups must have been incorporated less than two years ago and have raised no greater than $5 million in total funding.

Don’t wait any longer. Learn more about Segment’s one-of-a-kind Startup Program. And if you’d like a hand picking your tools along the way, feel free to use our WYSIWYG MarTech stack builder.

About the Author

Dan McGaw is the founder of McGaw.io, MarTech speaker, and co-founder of analytics tools such as UTM.io. He’s worked extensively with Segment implementations and led the creation of tools such as the Segment CSV importer.

Jim Young on October 28th 2021

To scale Growth, you need to define its purpose, set goals, structure collaboration, and master your customer data. Three experts share their insights.

Benjamin Yolken on October 26th 2021

At Segment, we use Apache Kafka extensively to store customer events and connect the key pieces of our data processing pipeline. Last year we open-sourced topicctl, a tool that we developed for safer and easier management of the topics in our Kafka clusters; see our previous blog post for more details.

Since the initial release of topicctl, we’ve been working on several enhancements to the tool, with a particular focus on removing its dependencies on the Apache ZooKeeper APIs; as described more below, this is needed for a future world in which Kafka runs without ZooKeeper. We’ve also added authentication on broker API calls and fixed a number of user-reported bugs.

After months of internal testing, we’re pleased to announce that the new version, which we’re referring to as “v1”, is now ready for general use! See the repo README for more details on installing and using the latest version.

In the remainder of this post, we’d like to go into more detail on these changes and explain some of the technical challenges we faced in the process.

Kafka, ZooKeeper, and topicctl

A Kafka cluster consists of one or more brokers (i.e. nodes), which expose a set of APIs that allow clients to read and write data, among other use cases. The brokers coordinate to ensure that each has the latest version of the configuration metadata, that there is a single, agreed-upon leader for each partition, that messages are replicated to the right locations, and so forth.

Original architecture

Historically, the coordination described above has been done via a distributed key-value store called Apache ZooKeeper. The latter system stores shared metadata about everything in the cluster (brokers, topics, partitions, etc.) and has primitives to support coordination activities like leader election.

ZooKeeper was not just used internally by Kafka, but also externally by clients as an interface for interacting with cluster metadata. To fetch all topics in the cluster, for instance, a client would hit the ZooKeeper API and read the keys and values in a particular place in the ZooKeeper data hierarchy. Similarly, updates to metadata, e.g. changing the brokers assigned to each partition in a topic, were done by writing JSON blobs into ZooKeeper with the expected format in the expected place.

Some of these operations could also be done through the broker APIs, but many could only be done via ZooKeeper.

Given these conditions, we decided to use ZooKeeper APIs extensively in the original version of topicctl. Although it might have been possible to provide some subset of functionality without going through ZooKeeper, this “mixed access” mode would have made the code significantly more complex and made troubleshooting connection issues harder because different operations would be talking to different systems.

Towards a ZooKeeper-less World

In 2019, a proposal was made to remove the ZooKeeper dependency from Kafka. This would require handling all coordination activities internally within the cluster (involving some significant architectural changes) and also adding new APIs so that clients would no longer need to hit ZooKeeper for any administrative operations.

The motivation behind this proposal was pretty straightforward — ZooKeeper is a robust system and generally works well for the coordination use cases of Kafka, but can be complex to set up and manage. Removing it would significantly simplify the Kafka architecture and improve its scalability.

This proposal was on our radar when we originally created topicctl, but the implementation was so far off in the future that we weren’t worried about it interfering with our initial release. Recently, however, the first Kafka version that can run without ZooKeeper landed. We realized that we needed to embrace this new world so the tool would work continue to work with newer Kafka versions.

At the same time, we got feedback both internally and externally that depending on ZooKeeper APIs for the tool would make security significantly harder. ZooKeeper does have its own ACL system, but managing this in parallel with the Kafka one is a pain, so many companies just block ZooKeeper API access completely for everything except the Kafka brokers. Many users would be reluctant to open this access up (rightfully so!) and thus the ZooKeeper requirement was blocking the adoption of the tool in many environments.

Given these multiple factors, removing the ZooKeeper requirement from topicctl became a high priority.

Removing the ZooKeeper requirement

In the original code for topicctl, all cluster admin access went through a single struct type, the admin client, which then used a private ZooKeeper client for fetching configs, updating topics, etc. This struct exposed methods that could be called by other parts of the tool; the golang code for triggering a leader election in the cluster, for instance, looked like the following (some details omitted for simplicity):

Note that the client in this case isn’t actually communicating with the Kafka brokers or using any Kafka APIs. It’s just writing some JSON into /admin/preferred_replica_election, which, by convention, is where the Kafka brokers will look to start the process of running a leader election.

Our first step was to convert the APIs exposed by this struct into a golang interface with two implementations- one that depended on ZooKeeper, i.e. using the code from our original version of the admin client, and a second that only used Kafka broker APIs. 

So, the Client above became:

with the RunLeaderElection implementations becoming the following for the ZooKeeper and ZooKeeper-less versions, respectively:

The next step was to fill out the details of the broker-based admin client so that it actually worked. topicctl was already using the excellent kafka-go library for its functionality that depended on broker APIs (e.g., tailing topics), so we wanted to use that here as well. Unfortunately, however, this library was designed primarily for reading and writing data, as opposed to metadata, so it only supported a subset of the admin-related Kafka API.

After doing an inventory of our client’s requirements, we determined that there were six API calls we needed that were not yet supported by kafka-go:

Our next step was to update kafka-go to support these! At first, it looked easy- this library already had a nice interface for adding new Kafka APIs; all you had to do was create go structs to match the API message specs, and then add some helper functions to do the calls.

But, as often happens, we ran into a wrinkle: a new variant of the Kafka protocol had been recently introduced (described here) to make API messages more space-efficient. Although most of the APIs we needed had versions predating the update, a few only supported the new protocol format. To add all of the APIs we needed, we’d have to update kafka-go to support the new format.

Thus, we first went through all of the protocol code in kafka-go, updating it to support both the old and new formats. The proposal linked above didn’t have 100% of the details we needed, so in several cases, we also had to consult the Kafka code to fully understand how newer messages were formatted. After much trial and error, we eventually got this code working and merged.

Once that was done, we were unblocked from adding the additional APIs, which we did in this change. Finally, we could go back to the topicctl code and fill out the implementation of the broker-based admin client.

Returning to the RunLeaderElection example from above, we now had something like:

The end result is that we were able to get topicctl working end-to-end with either the ZooKeeper-based implementation (required for older clusters) or the ZooKeeper-less one (for newer clusters), with only minimal changes in the other parts of the code.

Security updates

In addition to removing the ZooKeeper requirement from topicctl , we also got several requests to support secure communication between the tool and the brokers in a cluster. We didn’t include these in the original version because we don’t (yet) depend on these features internally at Segment; but, they’re becoming increasingly important, particularly as users adopt externally hosted Kafka solutions like AWS MSK and Confluent Cloud.

We went ahead and fixed this, at least for the most common security mechanisms that Kafka supports. First, and most significantly, topicctl can now use TLS (called “SSL” for historical reasons in the Kafka documentation) to encrypt all communication between the tool and the brokers. 

In addition to TLS, we also added support for SASL authentication on these links. This provides a secure way for a client to present a username and password to the API; the permissions for each authenticated user can then be controlled in a fine-grained way via Kafka’s authorization settings.

Testing and release

As we updated the internals of topicctl, we extended our unit tests to run through the core flows like applying a topic change under multiple conditions, e.g. using ZooKeeper vs. only using Kafka APIs. We also used docker-compose to create local clusters with different combinations of Kafka versions, security settings, and client settings to ensure that the tool worked as expected in all cases. 

Once this initial testing was done, we updated the internal tooling that Segment engineers use to run topicctl to use either the old version or the new one, depending on the cluster. In this way, we could roll out to newer, lower-risk clusters first, then eventually work up to the bigger, riskier ones. 

After several months of usage, we felt confident enough to use v1 for all of our clusters and deprecate the old version for both internal and external users of the tool.

Conclusion

topicctl v1 is ready for general use! You might find it a useful addition to your Kafka toolkit for understanding the data and metadata in your clusters, and for making config changes. Also, feel free to create issues in our Github repository to report problems or request features for future versions.

Kelly Kirwan on October 26th 2021

Big data is a big deal to manage. And data integrity is vital for business growth and customer trust. Learn what it is, why it’s crucial, and how to ensure it.

Jim Young on October 22nd 2021

From October 20-21, Segment joined SIGNAL, Twilio’s annual customer and developer conference, alongside 50,000+ developers, product leaders, enterprises, and startups.

Pablo Vidal Bouza on July 15th 2021

How Segment moved from traditional SSH bastion hosts to use AWS Systems Manager SSM to manage access to infrastructure.

Leif Dreizler on March 2nd 2021

Building customer-facing security features in partnership with dev teams helps you better serve your customers, unlocks additional revenue, and bidirectionally transfers knowledge between teams—a concept at the very core of DevSecOps.

Udit Mehta on January 20th 2021

Learn how we use AWS Step Functions for large-scale data orchestration

Growth & Marketing

Nupur Bhade Vilas on October 20th 2021

Meet Twilio Engage: the first growth automation platform designed for the digital era.

Sam Gehret on July 29th 2021

A look at server-side activation as the new alternative to the third-party advertising pixel.

Sudheendra Chilappagari on February 18th 2021

Learn how to use Segment and Twilio Programmable Messaging to send a personalized SMS campaign.

Become a data expert. Subscribe to our newsletter.

Josephine Liu, Sherry Huang on June 9th 2021

Our latest feature, Journeys, empowers teams to unify touchpoints across the end-to-end customer journey.

Kate Butterfield on June 16th 2021

Get an inside look at the design process for Journeys.

Katrina Wong on March 31st 2021

With Segment, brands can leverage their first-party customer data to build deeper customer relationships.

Madelyn Mullen on August 17th 2020

Your business growth depends on empowering every team with good data. Introducing the Segment Data Council, a series of interviews with seasoned customer data experts who know how to build bridges across the organization and empower teams.

Madelyn Mullen on August 17th 2020

Imagine if your PMs had an overview of support tickets, billing issues, sales interactions, and users’ clickstreams—all unified and available via self-service. It would be the Holy Grail of data management. Listen to more in this Data Council episode.

Madelyn Mullen on August 17th 2020

Simply put, data governance leads to better automation. Listen to this Data Council episode to hear how Arjun Grama grew his customer data wrangling techniques to transform product lines at IBM and raise the bar on growth KPIs at Anheuser-Busch InBev.

Madelyn Mullen on August 17th 2020

What does it take for a data driven business case to excite stakeholders across an organization? Tune in to this Data Council episode for an insider perspective from Kurt Williams, Global Director of Customer Products at Anheuser-Busch InBev.

Become a data expert.

Get the latest articles on all things data, product, and growth delivered straight to your inbox.