Deep Learning for Object Recognition

Python · PyTorch · Deep Learning · Computer Vision · Segmentation · Classification · Data Augmentation

Overview

For the IAPR course, our group developed a two-stage deep learning pipeline for instance segmentation and classification of 13 different chocolate types from images. The project involved working with weakly annotated data and adhered to strict constraints, including limited model parameters (~9.7M total) and no use of pre-trained models.

My Role

Contributed to the architectural design of the two-stage pipeline (Attention U-Net for segmentation, custom CNN for classification).
Involved in training the Attention U-Net (8M parameters) for precise binary segmentation of chocolates from diverse backgrounds, using a BCEDiceLoss.
Implemented the watershed algorithm for instance separation of touching chocolates.
Contributed to developing and training the custom 'FeatureExtractor' CNN (1.7M parameters) for classifying the segmented chocolate instances, using Label Smoothing Loss.

Challenges

Operating under strict model parameter limits and without pre-trained networks.
Achieving robust segmentation and classification with weakly annotated data and class imbalance.

Outcomes

Successfully developed a pipeline that accurately identified and counted all 13 chocolate types.
Achieved a validation Dice score of ~0.98 for segmentation and classification F1-score of ~97% on extracted regions.
The solution effectively handled various backgrounds and some object occlusions.

Figures

Fig. 1: Architecture diagram of our two-stage pipeline with Attention U-Net for segmentation and CNN for classification

Fig. 2: Segmentation results showing the binary masks generated by our Attention U-Net model

Fig. 3: Training curves for the FeatureExtractor CNN showing loss and accuracy over epochs

Fig. 4: Example of chocolate classification results with detected instances and their predicted classes