Iterate over a project
In this article, we will learn how to iterate through a project with annotated data in python. It is one of the most frequent operations in Superviely Apps and python automation scripts.
Everything you need to reproduce this tutorial is on GitHub: source code, Visual Studio code configuration, and a shell script for creating venv.
In this guide we will go through the following steps:
**** Step 1. Get a demo project with labeled lemons and kiwis.
**** Step 2. Prepare .env
files with credentials and ID of a demo project.
**** Step 3. Run python script.
**** Step 4. Show possible optimizations.
1. Demo project
If you don't have any projects yet, go to the ecosystem and add the demo project 🍋 Lemons annotated
to your current workspace.
2. .env
files
.env
filesCreate a file at ~/supervisely.env
with the credentials for your Supervisely account. Learn more about environment variables here. The content should look like this:
Create the second file local.env
and place it in the same directory with the main.py
. This file will contain values we are going to use in the python script.
The reason why the variable for Project ID has such a strange name modal.state.slyProjectId
will be explained later in the next tutorials. Let's just keep it this way for now.
3. Python script
This script illustrates only the basics. If your project is huge and has hundreds of thousands of images then it is not so efficient to download annotations one by one. It is better to use batch (bulk) methods to reduce the number of API requests and significantly speed up your code. Learn more in the optimizations section below.
To start debugging you need to
Clone the repo
Create venv by running the script
create_venv.sh
Change value in local.env
Check that you have
~/supervisely.env
file with correct values
Source code:
Output
The script above produces the following output:
4. Optimizations
The bottleneck of this script is in these lines (27-28):
If you have 1M images in your project, your code will send 🟡 1M requests to download annotations. It is inefficient due to Round Trip Time (RTT) and a large number of similar tiny requests to a Supervisely database.
It can be optimized by using the batch API method:
Supervisely API allows downloading annotations for multiple images in a single request. The code sample below sends ✅ 50x fewer requests and it leads to a significant speed-up of our original code:
The optimized version of the original script is in main_optimized.py
.
Last updated