In this lesson I am going to show you how to create a Python program to find duplicate photos.
As someone who takes a lot of photos, I am constantly saving the same image multiple times by mistake, and each time when I save my photo, I tend to name the photo differently. Unfortunately finding duplicate photos is not as easy as finding duplicate names when you are checking them manually.
By using Python, we can easily create a program to return photos that are identified as duplicates.
Before we dive into tutorial, you will need to install the Pillow library (pip install pillow), which is used for image manipulation and processing.
Buy Me a Coffee? Your support is much appreciated!
PayPal Me: https://www.paypal.me/jiejenn/5
Venmo: @Jie-Jenn
Source Code:
import os
from PIL import Image, ImageStat
image_folder = os.path.join(os.getcwd(), 'Images')
image_files = [_ for _ in os.listdir(image_folder) if _.endswith('jpg')]
duplicate_files = []
for file_org in image_files:
if not file_org in duplicate_files:
image_org = Image.open(os.path.join(image_folder, file_org))
pix_mean1 = ImageStat.Stat(image_org).mean
for file_check in image_files:
if file_check != file_org:
image_check = Image.open(os.path.join(image_folder, file_check))
pix_mean2 = ImageStat.Stat(image_check).mean
if pix_mean1 == pix_mean2:
duplicate_files.append((file_org))
duplicate_files.append((file_check))
print(list(dict.fromkeys(duplicate_files)))