CleanVision automatically detects various issues in image datasets, such as images that are: (near) duplicates, blurry, over/under-exposed, etc. This data-centric AI package is designed as a quick first step for any computer vision project to find problems in your dataset, which you may want to address before applying machine learning.


To install the latest stable version (recommended):

$ pip install cleanvision

To install the bleeding-edge developer version:

$ pip install git+


Using CleanVision to audit your image data is as simple as running the code below:

from cleanvision.imagelab import Imagelab

# Specify path to folder containing the image files in your dataset
imagelab = Imagelab(data_path="FOLDER_WITH_IMAGES/")

# Automatically check for a predefined list of issues within your dataset

# Produce a neat report of the issues found in your dataset

CleanVision diagnoses many types of issues, but you can also check for only specific issues:

issue_types = {"light": {}, "blurry": {}}


# Produce a report with only the specified issue_types

More on how to get started with CleanVision: