Elevate Your Machine Learning Workflow: How to Use MLflow for Experiment Tracking and Model Management

Dipankar Medhi
9 min readMay 18, 2023


Stay on top of your machine learning workflow with MLflow’s powerful experiment tracking and model management tools

Introduction — mlflow

Machine learning is an ever-evolving field, where new algorithms, models, and frameworks emerge frequently. This makes it essential to track and reproduce experiments to validate and improve the accuracy of the models. MLflow is an open-source platform that makes it easier to manage the machine learning lifecycle, from experimentation to deployment. It offers a unified interface for tracking experiments, packaging and deploying models and managing the complete machine-learning workflow.

With MLflow, data scientists can organize and compare their experiments, reproduce results, and share them with their teams. In this blog post, we will explore how MLflow can be used for experiment tracking, and model management and how it can make the machine learning workflow more manageable and efficient.

In this blog, we’ll explore how to use MLflow for experiment tracking and model management using a diabetes prediction example. But before that, let us look at some important concepts of mlflow.

Components of mlflow

MLflow is a comprehensive platform that comprises several components, each designed to address different stages of the machine learning lifecycle. Here, we will take a closer look at each component and discuss how it benefits the machine learning workflow.

credit: mlflow.org

MLflow Tracking: This component allows data scientists to track , compare experiments, store artifacts and results, and visualize metrics in a web-based UI. With MLflow Tracking, users can reproduce results and share them with their team, which helps streamline collaboration and make data-driven decisions.

MLflow Models: The MLflow Models component provides a standardized way to package machine learning models in multiple formats, including Python functions, Docker images, and more. This allows data scientists to easily deploy models in various environments and share them with others. With MLflow Models, users can also track model lineage, which shows the complete history of a model’s development, making it easier to reproduce and validate results.

Model Registry: The model registry is another important component of the platform, which allows data scientists to manage and deploy models in a production environment. The Model Registry provides a central location for storing, organizing, and versioning models, and allows for easy model sharing and collaboration between team members. With the Model Registry, data scientists can also track model performance and manage access control to ensure data security.

MLflow Projects: The MLflow Projects component provides a simple way to package code, data, and dependencies into a reusable and reproducible format. With MLflow Projects, users can easily share their code and reproduce results, even on different machines and environments. This component is particularly useful for managing large-scale projects and collaborative research.

By using MLflow, data scientists can focus on what they do best: developing and improving machine learning models, without worrying about the hassle of managing the entire machine learning lifecycle.

Installing mlflow

To get started with MLflow, the first step is to install it. It is recommended to create a virtual environment before installing the package to avoid conflicts with other packages. You can create a virtual environment using the command:

$ python3 -m venv venv

Activate the virtual environment using:

$ . venv/bin/activate

Once you have activated the virtual environment, you can install MLflow using the following command:

$ pip install mlflow

After installation, you can verify if it was successful by checking the version of MLflow using the command

$ mlflow --version
// output:
// mlflow, version 2.2.2

You should see the version number of MLflow displayed, which indicates that the installation was successful. Now you are ready to start using MLflow to manage your machine learning lifecycle!

Experiment Tracking

Experiment tracking is a critical part of the machine learning workflow that enables data scientists to keep track of the experiments they run and the results they produce.

Here are some examples of how Experiment Tracking can be used with MLflow:

Tracking model hyperparameters: One of the essential tasks in machine learning is tuning model hyperparameters to achieve the best performance. With MLflow, data scientists can track the values of hyperparameters used during training, making it easier to compare the results of different experiments and identify the best-performing model.

Visualizing experiment results: MLflow provides a web-based user interface for visualizing experiment results, which makes it easy for data scientists to compare and contrast different models’ performances. The user interface provides visualizations for metrics such as accuracy, loss, and validation, making it easier to identify trends and spot issues quickly.

Recording artifacts: Data scientists can use the Artifact API in MLflow to record and store artifacts such as trained models, input data, and output data. This helps keep track of the data used during experimentation and ensures the reproducibility of results.

Collaborating with team members: Experiment Tracking in MLflow provides a centralized location for data scientists to collaborate on projects. Multiple users can access the same experiment, record their results, and discuss findings, making it easier to work as a team.

MLflow’s Experiment Tracking module is a powerful tool that enables data scientists to organize their experiments into runs and keep track of various parameters, metrics, metadata, artifacts, and models. In addition to these details, MLflow automatically logs extra information about the run, including the source code, version of the code (git commit), start and end time, and author.

This automatic logging of information is particularly useful for ensuring the reproducibility of results and keeping track of the various experiments conducted during the machine learning workflow. With this information, data scientists can easily compare and contrast different experiments and identify the best-performing models, all while ensuring that they have a complete record of the experiments conducted.

For example, a data scientist working on a natural language processing project can use MLflow’s Experiment Tracking module to keep track of parameters such as the learning rate and the number of epochs used during training. They can also record metrics such as accuracy and F1 score and store artifacts such as preprocessed data and trained models. With MLflow’s automatic logging of information, the data scientist can easily reproduce the experiment and track its progress over time, allowing for greater insight into the development of the machine learning model.

Now that we have an understanding of what experiment tracking entails, let’s continue with our example.

Importing all the necessary libraries

First, we need to import the necessary libraries

import pandas as pd
import numpy as np

Importing data

Next, we can load our dataset using the pandas' read_csv function. We will load the "diabetes.csv" file into a pandas dataframe called df. We can load the data using the following code:

df = pd.read_csv("diabetes.csv")

To gain more insight into our data, we can use the info method to display information about the dataframe such as the number of rows, columns, and data types. We can display this information using the following code:

# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 768 non-null int64
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB

Prepare training and testing data

After loading and exploring the data, we can prepare our data for training and testing. We will consider the “Outcome” column as our target variable. We will split our data into features (X) and labels (y) using the following code:

X = df.drop(columns=["Outcome"]).values
y = df["Outcome"].values

Next, we will split our data into training and testing datasets using the train_test_split function from the sklearn library. We will use 70% of the data for training and 30% for testing, and we will set a random state for reproducibility. We can split our data using the following code:

from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=10)
print("X train shape: {} and y train shape: {}".format(X_train.shape, y_train.shape))
print("X test shape: {} and y test shape: {}".format(X_test.shape, y_test.shape))

# output
X train shape: (537, 8) and y train shape: (537,)
X test shape: (231, 8) and y test shape: (231,)

Lastly, we will replace all zero values in our data with the mean of the non-zero values using the SimpleImputer function from the sklearn library. We will fit the imputer to our training data and transform both our training and testing data using the fitted imputer. We can replace zero values using the following code:

from sklearn.impute import SimpleImputer
values_fill = SimpleImputer(missing_values=0, strategy="mean")
X_train = values_fill.fit_transform(X_train)
X_test = values_fill.fit_transform(X_test)

By following these steps, we have prepared our data for training and testing our machine-learning model.

Model training — use mlflow for experiment tracking

First, we’ll start by setting up the mlflow backend server using an SQLite database to track all the changes made during our experiments. We’ll also set the experiment name to “diabetes-prediction-exp-1” using mlflow.

mlflow ui --backend-store-uri sqlite:///mlflow.db

This will set up an SQLite database on the back end to track all the changes made during the experiments.

Next, we’ll import the necessary libraries and start the training process. We’ll use a Random Forest Classifier model and log various parameters such as the random state, accuracy score, and root mean squared error (RMSE) to mlflow.

import mlflow

from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, mean_squared_error, confusion_matrix

We’ll also save the trained model using pickleand log it as an artifact to mlflow.

import pickle

with mlflow.start_run(run_name="diabetes-1.1"):
random_state = 8
mlflow.set_tag("developer", "dev-name")
mlflow.log_param("random_state", random_state) # hyperparameters of models or other parameters that changes during experimentation

rfm = RandomForestClassifier(random_state=random_state)
rfm.fit(X_train, y_train)
y_pred = rfm.predict(X_test)

rmse = mean_squared_error(y_test, y_pred, squared=False)
accuracy = accuracy_score(y_test, y_pred)

mlflow.log_param("rmse", rmse)
mlflow.log_param("accuracy score", accuracy)

print("RandomForest accuracy: {}".format(accuracy_score(y_test, y_pred)))

# save model
pickle.dump(rfm, open('models/model.pkl', 'wb'))

mlflow.log_artifact(local_path="/models", artifact_path="models_pickle")

A run will be created, which can be viewed on localhost:5000, under diabetes-prediction-exp-1 tab.

experiment run created on mlflow under experiments

If we click on the most recent run i.e the first one, we can see the logs as parameters, metrics, Artificats, etc.

inside the experiment run

Searching for the best model

After training our initial model, we’ll use hyperoptto search for the best parameters to train our model.

from hyperopt import fmin, tpe, hp, STATUS_OK, Trials
from hyperopt.pyll import scope

We’ll define an objective function that takes in the parameters to be tested and logs the RMSE and accuracy score to mlflow.

def objective(params):
with mlflow.start_run():
mlflow.set_tag("model", "randomforest")
rfm = RandomForestClassifier(random_state=random_state)
rfm.fit(X_train, y_train)
y_pred = rfm.predict(X_test)

rmse = mean_squared_error(y_test, y_pred, squared=False)
accuracy = accuracy_score(y_test, y_pred)

mlflow.log_param("rmse", rmse)
mlflow.log_param("accuracy score", accuracy)

y_pred = rfm.predict(X_test)
rmse = mean_squared_error(y_test, y_pred, squared=False)
mlflow.log_metric("rmse", rmse)

return {'loss': rmse, 'status': STATUS_OK}

We’ll then use hyperopt to search for the best parameters by specifying the search space, algorithm, and the maximum number of evaluations. The output will be the best parameters found by hyperopt.

search_space = {
"n_estimators": hp.choice("n_estimators", [100, 200, 300, 400,500,600]),
"max_depth": hp.quniform("max_depth", 1, 15,1),
"criterion": hp.choice("criterion", ["gini", "entropy"]),

best_result = fmin(

print(f"Best: {best_result}")

# output
# Best: {'criterion': 0, 'max_depth': 1.0, 'n_estimators': 5}

Finally, we’ll train the model again with the best parameters obtained in the previous step and use Mlflow’s autolog feature to log all the default parameters and metrics to our dashboard.

best_params = {
'criterion': 0,
'max_depth': 1.0,
'n_estimators': 5

best = fmin(

Dashboard example

Some example screenshots of the mlflow dashboard.

Latest autolog experiment run


In conclusion, MLflow offers a comprehensive platform for elevating your machine learning workflow. With its powerful experiment tracking and model management tools, data scientists can effectively organize, compare, and reproduce experiments to validate and improve model accuracy.

By using MLflow, data scientists can focus on developing and enhancing models without the burden of managing the entire lifecycle. With its user-friendly interface and automatic logging capabilities, MLflow empowers data scientists to make data-driven decisions, ensure reproducibility, and drive efficient and successful machine learning projects. Elevate your machine learning workflow with MLflow today!