In Part II of this blog series, we looked at the detailed steps in designing the Network Chart in Tableau. In this post, we will look at the design steps for the Sankey chart.
We are using the superstore sample data that comes preinstalled with Tableau. However, there are certain steps we must take as we open and read the Excel file in Tableau
We start by selecting the sample dataset and opening it in Tableau. However, when we select the Excel file, we click on the small drop-down arrow, and open with Legacy Connection.
Then, click on the data drop down menu at the top of the screen and select 'Convert to Custom SQL.' This will display a new window, as shown below.
After selecting the custom SQL option, a new window will pop up, giving you the option of editing the SQL. Now we add a 'union all' clause. We will use first part of the union for our 'Product Categories' markers, and the second part for our 'Market' markers.
Select 'OK,' and the size of your data set should double.
The next few steps involve creating some calculated fields, beginning with a padding field. The idea behind this calculation is to create data densification so that we have multiple lines like we expect in the viz.
Right click on the calculated field you just created and select 'Create' and then 'Bins.' Change the size to 1 and give it a new name (Pad in our case). Hit OK and you should now see a new Dimension called 'Pad.'
Next we create a new calculation - 't'.
We then drag the calculated field 't' on the rows shelf and add 'Pad' to the level of detail. Right click on 't' from the columns shelf and change Compute Using to 'Pad' from 'Table Down.'
Create two calculated fields, Rank1 and Rank2, with the same formula:
Create a sigmoid function calculation as follows:
1/(1 + EXP(1) ^ -[t])
This is where we have used the calculated field 't' from before.
Create a calculation that gives the smooth gradient-like curve for the flow as follows:
[Rank1] + (([Rank2] - [Rank1]) * [Sigmoid])
Armed with all of the calculations, we can continue designing the chart. Drag 'Curve' to the rows shelf. Drag 'Category' and 'Market' to the level of detail. Right click on the 'Curve' calculation on the rows shelf. The following calculation window will open up:
At this point we are trying to build a nested calculation using the existing calculations. In the window that pops up, select 'Specific Dimensions' and then select all options, ordering the dimensions as shown below for 'Rank1' calculation:
Do the same for 'Rank2', except with a different order, as shown.
For 't' we use only Pad calculation for our nested calculation.
Right click on the 't' axis at the bottom and change the range to use 'Fixed' values from -7 to 7.
Do the same for the 'Curve' axis, changing the range from 0 to 1 instead of automatic. Also check the 'Reversed' scale option.
Hide the x and y axes. Change mark type to 'Line' and add Pad to the 'Path' shelf of the marks card. This will render smooth lines from the Categories to Region.
Next, size the lines based on the sales of the categories in the different markets. For this, create a calculated field as follows:
Then drag this calculated field to 'Size' on the marks card as shown. To calculate the field using the 'Pad' calculated field, right click on sales line size and select 'Pad' under Compute.
To color the charts, click on the category level of detail dimension. Change to color. Do the same for Market. Then, change the color palette to the desired choice.,
Create two bar charts on two different sheets − one with Products and one with Markets. For the category bar, use 'percent of sales' as the metric.
Color code them using the same colors as the Sankey chart.
Finally, create a dashboard with the three sheets laid out next to each other. You can choose to use containers or just float them around on the dashboard.
In the final part of this series, we will look at the steps for designing the Sunburst diagram.
P.S.Those math skills are still going to come in handy. Don't forget them just yet!