Extract data from Mixpanel API

 

mixpanel-postAthough I love Mixpanel, there are cases that you would like to extract data from it. Then load your data to a data warehouse like Amazon Redshift for further analysis. This post is a small overview off Mixpanel’s API and how to access and extract data from it. purpose of this guide is to help you define a process or pipeline .

About Mixpanel

For those that do not know Mixpanel, it helps you make your product better by measuring actions, instead of page views. Mixpanel gives you the ability to measure what people are doing in your app on iOS, Android, and web.

Extract data from Mixpanel

Mixpanel is an analytics as-a-service application. We usually think of it as place to see my data and not a place where I would get data from. Why? I may need to perform analysis that involves data from other sources.

Mixpanel collects data related to how your customers use your product. In case you need to have more sources you may:

  1. Enrich Mixpanel with data coming from other sources.
  2. Extract the data Mixpanel holds for you and load it on a data warehousing repository. This is what we are going to review here.

As we only need to pull data, we are going to work with the Data Export API. It’s purpose is to help you pull your data out of Mixpanel at any time.

mixpanel

As a web API, you can access it using  by using tools like CURL or Postman or Apirise or your favorite http client for the language or framework of your choice. Some options are the following:

Or you can use the libraries/SDKs that Mixpanel offers for the following languages:

As a RESTful API it offers the following resources that you can interact with:

Annotations

  • annotations– list the annotations for a specified date range.
  • create– create an annotation
  • update– update an annotation
  • delete– delete an annotation

Export

  • export– get a “raw dump” of tracked events over a time period

Events

  • events– get total, unique, or average data for a set of events over a time period
  • top– get the top events from the last day
  • names– get the top event names for a time period

Event Properties

  • properties– get total, unique, or average data from a single event property
  • top– get the top properties for an event
  • values– get the top values for a single event property

Funnels

  • funnels– get data for a set of funnels over a time period
  • list– get a list of the names of all the funnels

Segmentation

  • segmentation– get data for an event, segmented and filtered by properties over a time period
  • numeric– get numeric data, divided up into buckets for an event segmented and filtered by properties over a time period
  • sum– get the sum of a segment’s values per time unit
  • average– get the average of a segment’s values per time unit
  • Segmentation Expressions– a detailed overview of what a segmentation expression consists of

Retention

  • retention– get data about how often people are coming back (cohort analysis)
  • addiction– get data about how frequently people are performing events

People Analytics

  • engage– get data from People Analytics

More details here: https://mixpanel.com/docs/api-documentation/data-export-api

Get your data!

Let’s assume that we want to export our raw data from Mixpanel. To do so we’ll need to execute requests to the export endpoint. An example of a request that would get us back raw events from Mixapanel looks like this:

https://data.mixpanel.com/api/2.0/export/?from_date=2012-02-14&expire=1329760783&sig=bbe4be1e144d6d6376ef5484745aac45
&to_date=2012-02-14&api_key=f0aa346688cee071cd85d857285a3464&
where=properties%5B%22%24os%22%5D+%3D%3D+%22Linux%22&event=%5B%22Viewed+report%22%5D

The returned result is always in JSON serialization with one event per line sorted by increasing timestamp. It looks like the following sample:

{"event":"Viewed report",
"properties":{"distinct_id":"foo","time":1329263748,"origin":"invite", "origin_referrer":"http://mixpanel.com/projects/","$initial_referring_domain":"mixpanel.com", "$referrer":"https://mixpanel.com/report/3/stream/","$initial_referrer":"http://mixpanel.com/", "$referring_domain":"mixpanel.com","$os":"Linux","origin_domain":"mixpanel.com","tab":"stream", "$browser":"Chrome","Project ID":"3","mp_country_code":"US"}} 

Important: Data from the export API are updated every 24 hours, so you will always have access to the data from the previous day.

Schema & Load

After you extract all the information you need, you have to map it to the schema of your data warehouse repository and then load the data to it.

In order to load your data into a data warehouse you will have to follow its data model. The raw data extracted from Mixpanel should be mapped to a schema supported by the data store. Your data comes in JSON. JSON supports a smaller range of data types. We have to make sure that we have mapped your types into one of the datatypes that is supported by your data store. When you have concluded on the design you need to load your data on your data warehouse.

The design of the schema and loading of the data is of most importance as it can both affect the performance of your data warehouse cluster and the questions that you can answer. And of course this procedure differs in the case of Amazon Redshift or Azure SQL Data Warehouse or Google BigQuery.

Takeaways: Which option will work for you?

The way to proceed relies on the data you want to load and the requirements of your use case. Things can get even more complicated if you want to integrate data coming from different sources. You can:

  1. Write, host and maintain a flexible data infrastructure.
  2. Use a product like Blendo that can handle this kind of problems automatically for you. With Blendo you can quickly and safely move all your data from Mixpanel into data warehouse solutions like Amazon Redshift and start generating insights from your data.

Would you like to spread the word? Please share 🙂


Sources:

Advertisements

One thought on “Extract data from Mixpanel API

  1. Pingback: This week in API land #37 | Restlet - We Know About APIs

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s