For marketing and product teams in 2022, using a customer data platform (or CDP) to pipe and clean data from their websites and web apps to various other services has become a widely-adopted industry standard. More specifically, Segment's adoption has skyrocketed in startups and later-stage companies and I predict their growth will continue across the enterprise segment after their acquisition into Twilio.
What is Segment? What is a CDP?
What is the purpose of Segment or a CDP? Customer Data Platforms (CDPs) like Segment allow various apps (including your own applications) to pipe events (e.g. adding something to a shopping cart), customer data, and more, to other applications like analytics products, CRMs, email apps, etc. They help maintain the holy promise of digital products and marketing - the single customer view.
Twilio Segment’s main features are:
Connections: Integrate web and mobile app data with a single API
In layman’s terms, Connections allow for data from external apps and internal apps to talk to each other. Protocols allow you to clean up your data and gives your team a standard framework to organize and scale data collection across multiple apps. Personas help give your marketing and product team a single customer view for prospects and users. Finally, their newest feature, Journeys, which looks to be a customer.io competitor, allows your product and marketing team to orchestrate events across different stages of your sales and marketing funnel.
Segment has built a great product, a seamless way to onboard (copy and paste a simple JS script to get started), and hundreds of connectors to open up a universe of apps that marketers and product people need. Everything gets stored into Segment seamlessly and it does a decent job of cleaning that data and piping that back into other applications. However, this approach can have many drawbacks over the long-term.
What's the issue with dumping everything into Segment?
Expensive. It can get costly, fast.
Data Redundancy. There's redundancy, especially if you have another data warehouse (which most later stage organizations do).
Stale data. There is some staleness in the data (it isn't updated quickly and depends on your plan with Segment).
Lack of fine tuning ML. Can't apply proprietary models (this is mostly a problem if you have more specific ML needs or a great ML team).
Security risks. There’s always 3rd party risks that add liability to your overall system… although Segment has a great security team and practice!
Introducing… a Warehouse-first approach
New entrants like Rudderstack are touting a "Warehouse-first" approach to solve many of these issues. Instead of dumping all of the data into a CDP, you dump the data into your own warehouse and use a service like Rudderstack to do the cleaning and routing of the data. This allows you to build data enrichments directly from the data warehouse and eliminates data redundancy, 3rd party security risks, and reduces costs (because you would be storing data fewer times over).
But what are the downsides with this approach?
Complexity. Implementing and maintaining a data warehouse requires lots of overhead and can add additional complexity to your architecture. Also, not every part of Segment can be replicated through one alternative vendor (e.g. Snowplow only does data collection) so you may need multiple tools to replicate Segment's functionality.
Cost. Data warehouses aren't necessarily cheap. Some warehouses, like Snowflake, are more startup-friendly because of their pay-per-usage model while others have implementation fees or high startup costs that might not work for earlier stage companies.
Loss of flexibility. If your architecture or requirements are in flux (e.g. some parts of your stack might move or your feature set or offering might change) picking a data warehouse prematurely could lock you in or cause headache in the future.
What's the best approach? Should I move off Segment?
The best approach will depend on your specific use case but I will go over some common ones below. Segment has a wide-range of features and rang of end-users that it can cater to, so no one platform can replace everything that Segment does (at least today). However, there are some exciting new entrants in the space that give more flexibility and control over your data which could be an advantage if data is a core competency for your organization.
For early pre-PMF startups:
Need optionality for future changes
May have simpler integration needs (usually GA, PostHog, User Story, SendGrid, etc)
Need a plug and play data collection tool
Limited resources for maintenance
Limited budget to pay for expensive tools
Need to track or pipe main events into a CRM or email marketing tool
Need to track or analyze events for simple data visualizations and running cohort analysis
For maximizing optionality, simplicity, and cost at the forefront - starting off with the Segment startup program is a great approach. Segment gives you the functionality to move or integrate your own data warehouse later down the line (although you might need to pay extra for this feature) so you can defer that decision to a later point. You likely have very simple integration and schema requirements at this stage so using the out-of-box tools that Segment provides probably saves you the most time and headache while you iterate towards product-market fit.
For scaling startups:
Core product functionality is somewhat set
Lots of integrations with some bespoke (e.g. piping data to many SaaS apps, custom data destinations, visualization tools, etc.)
Have a dedicated data team outside or adjacent to the engineering team
Segment’s annual costs are anywhere between 30-90% of an engineer’s salary
Have a data warehouse like Google BigQuery, Amazon Redshift, or Snowflake where the data team stores a bunch of data
Have a data scientist or team that needs to process and analyze the data
Check out Rudderstack to see if it’s a good fit to replace Segment or your current centralized CDP. Have a conversation with your product/marketing team to see if migration is feasible. In my opinion, this mostly applies to B2C companies that have a lot more events and where proprietary data analysis or ML is a core competency.
If you're a later stage company where core product functionality is somewhat set, you have more complex integration requirements, and you have a dedicated data engineering or data science team - I'd consider checking out alternatives to Segment like Rudderstack or Metarouter. You probably already have a data warehouse or data lake and pipes running through internal and external systems so running a self-hosted, open source Customer Data Platform might makes sense for your team.
What are the most popular Data Warehouses in 2022?
Snowflake (mostly for enterprises)
Google Big Query (startups that are on GCP)
Amazon Redshift (startups + enterprises that are on AWS)
Azure Data Warehouse / Synapse (enterprises that are on Azure)
Databricks (mostly for enterprises)
In conclusion, evaluating Segment alternatives in 2022 will depend on your use case: what features of Segment your team is most dependant on, how far along your startup is, and how important data engineering and analysis is as a core competency for your company. If you're a startup that's pre product-market fit that has simple data integration and customer data requirements I suggest starting off with the Segment startup program. If you're a later-stage company with a data science or data engineering team that has foundational data warehouses, pipelines, and processes in place, I'd consider alternatives like Rudderstack, Metarouter, or Hull.io.