Data labeling is the act of adding additional context or insight to an image or file. For problems that require a specialized type of insight, this labeling is often done by a human with the goal of building a model that can be used to replace human labeling over time. The applications of this technology are almost endless. For example, labeling and annotation are required to solve object detection and image segmentation problems.
However, labeling data can be quite tricky without the right workflow. Files can be massive to store, and training sets can be quite large to manage with thousands or millions of data points. Managing issues and people working on the labeling and annotation is also difficult without the right tooling. Furthermore, the labeled data should be of high quality to ensure great performance while training models.
In this example, we will use Ango.ai to label a dataset of your choice as well as teach you how to take advantage of training data that Ango’s team has already labeled and made free for public use. Ango.ai provides a fully-managed data labeling and annotation service. Ango supports labeling and annotating DICOM, images, audio, video, documents, and text data. Ango data labeling tools are quite comprehensive, allowing users to add bounding boxes, nested classification, rotated bounding boxes, polygons, points, and tables. Ango also runs the labeled data through a quality assurance pipeline to detect any incorrect labels.
Layer helps you create production-grade ML pipelines with a seamless local-to-cloud transition. Once your data and model are loaded to Layer, you can quickly train and retrain your model with Layer’s scalable fabric computing resources and track the story of how your model came to life with semantic versioning, extensive artifact logging, and dynamic reporting. Any pipeline created on Layer is easily shareable, downloadable, and editable by colleagues who want to reproduce your analysis. The beauty of using Layer is that you don’t need to change how or where your team programs — you throw in a few lines of code in your existing analysis, and you can take advantage of Layer capabilities.
Once the data is labeled and annotated, you are ready to start training models on that data. Ango and Layer are joining hands to enable the use of data from Ango to train models on Layer. You can now use the data from Ango Hub to train your machine learning projects on Layer infrastructure.
Fetching the labeled dataset with Ango SDK
To start using Ango, begin by creating an account. Next, create a project as shown below.
After that, add some images to that dataset.
You can now use the platform to label and annotate your dataset.
Once you are done with the annotations, you can export the JSON or YOLO annotation file.
Alternatively, you can download the file programmatically from Ango. You’ll need to install the Ango package for that.
pip install ango
You also need a project ID and an API key. The project ID can be obtained from the project URL, while the API key is generated on your Ango profile page.
Next, let’s define a helper function that will download the annotations and the images from Ango Hub. In this illustration, we’ll use the face classification dataset from Ango. The dataset is already annotated and ready to use.
Let’s now define a function that will use the above function to download the images into a Pandas DataFrame.
In this example, we will use the images to build a model that predicts gender given a new image.
Training model on Layer using the dataset
Let’s now look at how we can use Layer to train a face classification model using remote GPU on Layer infra. Start by installing Layer:
pip install layer
Next, you need to authenticate your account and initialize a project. Your API key can be found under your account settings at app.layer.ai.
Split the DataFrame into a training and testing set. We’ll use 80% for training and the rest for testing.
Next, let’s create a function to resize the images and convert them to arrays as required by the deep learning model.
Training a model using GPU’s in Layer is as simple as adding one line of code. That line is
@fabric(“f-gpu-small”) . The fabric decorator defines the type of environment you would like to train your model in. In this case, it’s GPU enabled environment. Layer offers over 30 hours per week of free GPU compute. Every function defining model training should be wrapped with the @model decorator. In the function, you can also log items that you’d like to track such as:
- Model metrics,
- Model parameter,
- Tables and
Passing the above function to
layer.run() will train the model on remote GPUs on Layer infrastructure.
This Colab Notebook shows how to make predictions using the trained model. Models trained on Layer can be fetched and used to make predictions immediately. Here’s a code snippet showing that:
We couldn’t be more glad to partner with Ango. Now data scientists can use Ango to create high-quality data in multiple formats and train those models on Layer. We continue to push the envelope to offer data scientists the best platform for collaborative machine learning by partnering with like-minded companies in the industry.