Experiment tracking for LightGBM projects

                                        Experiment tracking for LightGBM projects
Experiment tracking for LightGBM projects

LightGBM is a popular gradient boosting framework that places continuous values in discrete bins leading to faster training and efficient memory utilization. It uses leaf-wise tree growth, unlike other algorithms that use depth-wise growth. The algorithm can handle missing values and categorical features by default. You can use LightGBM for binary classification, regression, and multiclassification. While using the algorithm, there are core parameters that you can fine-tune to improve the performance of the model. For example, the type of boosting, number of leaves, and the depth of the tree.  

When working with LightGBM, you also want to generate some graphs such as the feature importance graph and graphs of training metrics. This article will look at how you can use Layer to track your LightGBM model development metadata.

Let's dive in.  

Install Layer

Layer provides an open-source SDK for tracking all your machine learning project metadata. It provides a hub for logging, displaying, comparing, sharing datasets, models, and project documentation.  

GitHub - layerai/sdk: Metadata store for Production ML
Metadata store for Production ML. Contribute to layerai/sdk development by creating an account on GitHub.

Furthermore, Layer provides seamless local and cloud integration. For example, you can perform simple computer vision experiments locally and quickly move the execution to the cloud by adding a few lines of code to your project. This saves you the time you'd spend setting up and configuring servers. What's more, Layer gives you up to 30 hours per week of free GPU. Layer will still track your project's metadata whether you use your local resources to run your project or run it on Layer Cloud.

Install Layer to start tracking your machine learning project's metadata.

pip install -U layer

Connect your script to Layer

Layer stores your ML metadata under your account. Let's import Layer and set up an account.

Click the link generated to set up your account.

You can sign up with either Google or GitHub. Click Continue on the next screen to set up your account.

On the next page, you'll get a chance set a unique username for your account.    

After choosing a username and clicking continue, you will be signed in to your account, and a code to authenticate the account will be generated.  

Copy the code, paste it into the textbox in your notebook, and press Enter to authenticate your account on a notebook.

If you are not working on a notebook environment, you can obtain an API key on the developer settings of your Layer account.  

Use that API key to log in to your Layer account. The key gives you full access to your account. You should, therefore, keep it private.

Create a Layer project

Layer encapsulates all ML metadata in a project. You can create multiple projects, and each project can have numerous experiments. Therefore, the first step is to initialize a project.

Once the project is created, the SDK will output a link that you can use to access your project.

All your project metadata will now be stored visible on this page. To make a project public click the setting menu. Any project you create will be private by default. Later in the article, we'll discuss how to add project documentation to this page.    

Version datasets

Layer versions datasets so you don't need to repeat expensive processing steps. Once the dataset is saved, you can fetch it without repeating the processing steps. If you make changes to the dataset, such as adding new features, Layer will automatically create a new dataset version. As an example, let's take a look at the Zindi Air Quality dataset that is available on Layer. We can see that project has three datasets.

Let's take a look at the train dataset. We can see:

  • The version number on the left.
  • Charts with data summary.
  • The number of columns in the data.
  • The size of the data.
  • That the data was locally executed.
  • When the data was created.
  • The person who created the data.  

Layer also shows Execution logs to make it easy to debug your data creation process. The Logged data tab will show all the metadata associated with your data. Use the layer.log() function to log any metadata you would like to track. Check out the link below to see how to add datasets to Layer.    

Add datasets | Layer documentation
Open in Layer Open in Colab Layer Examples Github

Version models

Layer will automatically create a new model version every time you train your LightGBM model. You, therefore, don't need to do manual model versioning. The use from Layer section shows how to import the Layer model. It includes the model's versions so that you are always using the right version in your project.  

Log model parameters

Once you have created your data, you can fetch it as a Pandas DataFrame or a PyTorch dataset.  

Layer provides the code snippet for fetching the data on the datasets page. For example, let's fetch the air quality dataset. We'll use it to develop a lightGBM model and see how we can track metadata with Layer.

We can log model parameters in Layer by creating a function decorated by the @model decorator. The function should return one of the supported ML frameworks.

When you run this function, Layer will store the model parameters you have defined.  

Folow the link generated by Layer to see the parameters on the Layer UI.

Log test metrics

Once model training is complete, use Layer to log the test metrics. Later, you can compare different experiment runs using these metrics.    

Log charts

Layer supports logging of images and charts. In a LightGBM project, you may want to log the feature importance and the training metric charts.

Log sample predictions

Layer allows logging of DataFrames. For example, we can log some sample predictions to Layer.

Use Layer trained model to make predictions

Copy the code snippet provided by Layer to start using your LightGBM to make predictions immediately.  

Compare different experiments

You can compare the performance of different LightGBM experiments by ticking the experiments on the left panel. You will now compare all the metadata that you logged at every run. This means you can compare the model's performance metrics, sample submission, images, charts, etc.    

Project documentation using Layer

Layer creates and outputs the link to your project's page when you run layer.init. You can use this page to document your LightBM project. Documentation is vital to make collaboration easy. For instance, you can send this page to your teammates, who can easily contribute to your project. It will also make it easy for you to return to this project. The page can also be used as a report detailing your project's performance.

Layer reports can be dynamic. You can link any Layer entity in your report. For example, you can show the comparison of different experiments in the report. This is done by creating a README.md file and pasting the link to that comparison. You can also copy and paste links to the project's visualizations in the report. Ensure that the README file is in the same directory as the notebook or Python file. Next, run layer.init again. When you do this, Layer will populate the contents of the README file on the project's page.  

Layer project sample documentation 

Final thoughts

This article has covered how you can perform experiment tracking for LightGBM projects. We have also seen how you can store the various metadata generated when creating LightGBM models. Specifically, we have covered:

  • Versioning datasets used for LightGBM models.
  • Versioning LightGBM model parameters.
  • Logging model test metrics.
  • Logging model sample predictions.
  • Versioning charts and images.
  • Fetching the LightGBM model for predictions.
  • Comparing different LightGBM experiments with Layer.
  • Documenting your LightGBM project using Layer.
Try Live LightGBM Notebook

Interested in discussing a use case for your organization? Book a slot below, and we'll show you how to integrate Layer into your existing ML code without breaking a sweat.

Book a demo

For more machine learning news, tutorials, code, and discussions, join us on Slack, Twitter, LinkedIn, and GitHub. Also, subscribe to this blog, so you don't miss a post.

Subscribe to Layer

Don’t miss out on the latest issues. Sign up now to get access to the library of members-only issues.