Counting Files by Type in Folders and Subfolders
When managing large collections of files, it can be helpful to analyze and count the different file types within a directory, including its subfolders. This Python script automates the process and exports the results to an Excel file.
How the Script Works
The script performs the following tasks:
- Recursively scans all files inside a specified root folder.
- Counts the number of files by type (file extension).
- Categorizes counts both globally and per subfolder.
- Exports the results to an Excel file with two sheets:
- Total Counts: Summarizes the count of each file type.
- Folder Counts: Provides file type counts per folder.
Python Script for File Type Counting
import os
import pandas as pd
from collections import defaultdict
def count_files_by_type(root_folder):
total_counts = defaultdict(int)
folder_counts = defaultdict(lambda: defaultdict(int))
for folder_path, _, files in os.walk(root_folder):
for file in files:
file_ext = os.path.splitext(file)[1].lower()
if file_ext:
total_counts[file_ext] += 1
folder_counts[folder_path][file_ext] += 1
return total_counts, folder_counts
def export_to_excel(root_folder, total_counts, folder_counts, output_file):
writer = pd.ExcelWriter(output_file, engine='xlsxwriter')
# Total counts sheet
total_df = pd.DataFrame(list(total_counts.items()), columns=['File Type', 'Total Count'])
total_df.to_excel(writer, sheet_name='Total Counts', index=False)
# Folder-wise counts sheet
folder_data = []
for folder, counts in folder_counts.items():
for file_type, count in counts.items():
folder_data.append([folder, file_type, count])
folder_df = pd.DataFrame(folder_data, columns=['Folder Path', 'File Type', 'Count'])
folder_df.to_excel(writer, sheet_name='Folder Counts', index=False)
writer.close()
def main():
root_folder = os.path.dirname(os.path.abspath(__file__)) # Auto-detect script location
output_file = os.path.join(root_folder, 'file_type_report.xlsx')
total_counts, folder_counts = count_files_by_type(root_folder)
export_to_excel(root_folder, total_counts, folder_counts, output_file)
print(f'Report saved to {output_file}')
if __name__ == '__main__':
main()
Sample Output
After running the script, an Excel file (file_type_report.xlsx
) is generated, containing two sheets:
Total Counts Sheet
| File Type | Total Count |
|-----------|------------|
| .txt | 150 |
| .pdf | 35 |
| .jpg | 20 |
Folder Counts Sheet
| Folder Path | File Type | Count |
|-------------------------|----------|-------|
| /home/user/docs | .txt | 50 |
| /home/user/images | .jpg | 20 |
| /home/user/downloads | .pdf | 35 |
Enhancements
- The script now automatically detects the root folder based on where it is placed.
- Uses
os.walk()
to recursively count files within subdirectories. - Saves results in an Excel file with structured sheets.
This script is a powerful tool for quickly analyzing file distributions in large directories, making file management easier and more efficient.