Azure Data Factory: On a Budget

2 minute read

ADF is a powerful cloud ETL tool capable of data transformation at scale. I have learnt (the hard way) that using the tool in certain ways can become very expensive. Here are a couple of cost saving tips I have picked up along the way:

Use Data Flows Sparingly

Data Flows are powerful and can shift and transform large volumes of Data. But they do cost much more than Pipeline Activities. If you can, use a Pipeline Activity, they are more than sufficient for basic Copy Data tasks.

Example Pipeline

When a Data Flow is required, do as much as you can in a pipeline and use the Data Flow activity to trigger the Data Flow where needed.

Be careful with Data Flow Debug

Turning this mode on keeps Data Flow Compute turned on, allowing you to quickly preview the data at different stages in the data flow. It also eliminates the long start up times of debug runs. But be warned, using it too much can be costly!

Avoid using it for normal Debug Runs

When debugging Pipelines, simply clicking Debug will auto turn on Data Flow debug for the entire duration of the pipeline. This will allow Data Flow Tasks to be executed quickly, but also means you will still be paying for the Compute when executing normal Pipeline Activities.

Instead, you can click the dropdown on the Debug button and choose “Use Activity Runtime”

Use Activity Runtime button

This will then execute the pipeline without data flow debug, only spinning up compute when required for Data Flow tasks.

Minimise Use

This one is perhaps a little obvious, but be extra careful when you do need to use DF Debug. Make sure you only turn it on when you need it and off again after. It is also wise to set the TTL to the minimum, so if you do forget it will auto turn off at some point.

Check Data Flow Core Count

When using the ‘Data Flow’ activity in Pipelines, make sure you set the Core Count to the minimum (or as appropriate). It sometimes defaults to more cores than required.. More Cores = More Cost!

This can be changed in the Settings of the Activity:

Change core count

Ensure All Pipelines Finish

When finishing up ADF Dev work for the day, make sure you check for active runs in Monitor. Ensure you don’t have any long running or failing pipelines stuck in a loop. These could continue for days, racking up a big cost!

Set Up a Budget

Set up a budget with Alerts on your Azure Subscription. This will help you to keep within your expectation of cost and avoid nasty surprises.

Change core count

Comments