Azure Data Factory: On a Budget
ADF is a powerful cloud ETL tool capable of data transformation at scale. I have learnt (the hard way) that using the tool in certain ways can become very expensive. Here are a couple of cost saving tips I have picked up along the way:
Use Data Flows Sparingly
Data Flows are powerful and can shift and transform large volumes of Data. But they do cost much more than Pipeline Activities. If you can, use a Pipeline Activity, they are more than sufficient for basic Copy Data tasks.
When a Data Flow is required, do as much as you can in a pipeline and use the Data Flow activity to trigger the Data Flow where needed.
Be careful with Data Flow Debug
Turning this mode on keeps Data Flow Compute turned on, allowing you to quickly preview the data at different stages in the data flow. It also eliminates the long start up times of debug runs. But be warned, using it too much can be costly!
Avoid using it for normal Debug Runs
When debugging Pipelines, simply clicking Debug will auto turn on Data Flow debug for the entire duration of the pipeline. This will allow Data Flow Tasks to be executed quickly, but also means you will still be paying for the Compute when executing normal Pipeline Activities.
Instead, you can click the dropdown on the Debug button and choose “Use Activity Runtime”
This will then execute the pipeline without data flow debug, only spinning up compute when required for Data Flow tasks.
Minimise Use
This one is perhaps a little obvious, but be extra careful when you do need to use DF Debug. Make sure you only turn it on when you need it and off again after. It is also wise to set the TTL to the minimum, so if you do forget it will auto turn off at some point.
Check Data Flow Core Count
When using the ‘Data Flow’ activity in Pipelines, make sure you set the Core Count to the minimum (or as appropriate). It sometimes defaults to more cores than required.. More Cores = More Cost!
This can be changed in the Settings of the Activity:
Ensure All Pipelines Finish
When finishing up ADF Dev work for the day, make sure you check for active runs in Monitor. Ensure you don’t have any long running or failing pipelines stuck in a loop. These could continue for days, racking up a big cost!
Set Up a Budget
Set up a budget with Alerts on your Azure Subscription. This will help you to keep within your expectation of cost and avoid nasty surprises.
Comments