Tableau Extracts–What / Why / How etc

What’s a Tableau Extract?
An extract in Tableau is a special type of database, optimised for use by Tableau. Its super fast, portable and a great way to handle large data sets. But how do you create one, and what creative ways are there to make use of them?
Firstly, how do you create and extract?
This bit is simple, assuming you are using Tableau Desktop. After you have connected to your data, go to the DATA menu and choose EXTRACT DATA – then accept the defaults on the dialog box (although more on this later). Tableau will as where you want to save the extract – choose any location to save the file, although Tableau will probably direct you towards ‘My Tableau Repository | Datasources’ which is just fine too!
Now wait for the extract to be created, how long you’ll wait depends on the database technology you’re using, network speed, data volumes that sort of thing.
You’ll know it’s done when the data source icon changes – it will have another database icon behind it, presumably representing a 'copy’, which is exactly what an extract is.
So, that’s how to create on (well the basics anyway) – but WHY would you want to?
Extract reason #1 – Performance
Extracts are FAST. They are awesomely fast with a million rows, and they only slow down to ‘amazingly fast’ with 100 Million rows.
But extracts are not just for big data sets. Once your familiar with the Tableau interface, even a 1 second delay when using the interface can be annoying – you get used to the speed and responsiveness quickly. I recommend using extracts for any data set which is anything other than super-responsive.
For example:
If you have 50,000 rows in Excel – its likely to be annoyingly slow. An extract solves this immediately.
If you’re using a Remote SQL server, create an extract over lunch, and you’ll have instant response times when you get back.
Extract reason #2 – Offline analysis
Extracts are files that you store on your PC (file extension .tde). So of course you can take one with you on holiday. If you just can’t sit by the pool without dreaming about histograms and scatterplots, then extracts could be for you…
They work on planes too.
Extract reason #3 – Accessing additional functionality
Some database technologies have restrictions that stop Tableau communicating with them as they would like. The most common problem caused by this (in my experience) is the absence of COUNT DISTINCT when using Excel as a data source.
To see this in action, connect to an Excel source, and then drag a dimension out using the RIGHT MOUSE BUTTON. When you drop this, you’ll get to choose from the following options:
Which is no good for me!! I’m trying to get COUNT DISTINCT…
But, if I change to an extract…
Now I get COUNT DISTINCT – which allows some really insightful visualisations to be built.
The same holds true if you are trying to get a MEDIAN value for a measure, this is not available from Excel, so you’ll need an extract.
FROM EXCEL FROM AN EXTRACT
Extract reason #4 – Creating packaged workbooks
Packaged workbooks are a fantastic bit of Tableau functionality, allowing you to created a distributable file which can be an interactive visualisation. This can be opened by users of the FREE Tableau Reader – which you can download here.
BUT – you can’t ‘package’ data which is held on remote servers, so you can’t package data from a SQL server, Oracle etc.
UNLESS you create an extract first. Then you’ll have no problems.
Extract reason #5 – Publishing to Tableau Public
You can’t use packaged workbooks without extract as described above, but neither can you use Tableau Public. If you want to publish data to the web, you’ll have to use an extract.
Extract reason #6 – Data security
Here’s a more subtle use case. Suppose you work at a hospital, and you are REALLY NOT ALLOWED to share patient level information – yet you want to create a packaged workbook with a data source which does contain this information. What do you do? Here’s where the extra functionality of the data extract dialog comes into play.
When you extract data, you’ll see this window. Notice the button at the bottom, this removes all the dimensions and measures which you have NOT used in ANY VIZ from the extract.
The extract you have then created can be packaged, safe in the knowledge that any fields you did not want to be visible are not even in the extract, so they can’t be accessed.
Using this same dialog, you can restrict the ROWS (rather than columns) which are included in your data set by using the filter section. Consider using a relative date filter for an extract which contains (say) the last 3 weeks of data).
Extract Reason #7 – Double aggregations
This is my personal favourite.
Suppose you have timesheet data which has one row per employee/day. You want to know the average hours recorded per MONTH.
To solve this one, you first need to aggregate the data by employee/month, and then produce a view which then averages the data by month (hence the double aggregation).
Extracts address this problem as they can be used to perform the first level of aggregation, providing a new data source over which to run the Average calculation.
This step is performed in the extract dialog window, by ‘Aggregating visible dimensions’ and then rolling up dates to month.
You can also use this feature to massively reduce the size of the extract file by '’Hiding’ dimensions in the data window before you take this step.
I’m sure there are many more reasons to use extracts – and that in the future we’ll see many more developments in this area from Tableau.
Please drop me a line if you have other uses you think should be on this list.


Reader Comments (3)
That was a great overview of some main point and factors of extracts with the Tableau Data Engine. The discussion on extracts in Tableau could go for quite a while. Here are a couple of other points in regards to extracts:
a. If you use Custom SQL, the query slowness custom SQL adds (because Tableau cannot write the most efficient queries when it is used) is removed.
b. When you publish to server, you can schedule the extract, so every interaction will not query a production database, and you can still get semi-live, or as live as you need it data.
c. It is important to note that the majority of the speed gain is because the extract is a columnar data store (like Vertica or Infobright), and no longer a relational data store. Also just about anything is faster and a better choice than the MS Jet Engine (what Tableau uses to connect to Text/Excel/Access files). Columnar data store are designed to produce fast results aggregating data.
d. if you create a calculated field that is not an aggregation (or use a parameter), you will want to Optimize the extract, so the result of the field for each row can be added to the extract to retain fast performance.
e. The one downside of the new Data Engine is the lack of ability to use pass-through/RAWSQL functions. These are only useful in a select few cases now because custom table calculations and data blending can do most of what is needed, so not a major issue.
f. an extract will attempt to retain some calculation logic of the original data source. Some databases would see "A"="a" as a true condition because their string comparison is not case sensitive, while other databases are, and when you create an extract, Tableau will attempt to honor that, at at least retain the values of your calculated fields. This is sometimes a source of confusion.
g. If you refresh an extract, and the original data source structure has changed, Tableau will update the extract with those changes. This could cause some calculated fields to break, because of field name changes, but you can also add additional columns to your data this way as well.
I could go on an on about the extract engine, in short, I love it, and use it as early and as often as I can.
Wow joe, great comment!! Longer and more interesting than my original post. But you're right, extracts have much more to them.
In my experience many people don't use them, or appreciate their value for day to day analysis. Hence the post.
Thanks again
Tom
Fantastic post! Most of my data extracts are around 10 mill records and this is perfect!