| 
          
              CLIP Meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections
               
              Mohamed Fazli Imam, Rufael F Marew, Jameel Hassan, Mustansar Fiaz, Alham Fikri Aji, Hisham Cholakkal
               
              arXiv, 2024
               
              Vision Language models; Self-supervised learning
               
               paper
              
             | 
          
            
             
           | 
          
              Complex Video Reasoning and Robustness Evaluation Suite for Video-LMMs
               
              Uzair Khattak, Ferjad Naeem, Jameel Hassan, Muzammal Naseer, Federico Tombari, Fahad Shahbaz Khan, Salman Khan
               
              arXiv, 2024
               
              Multi-modal LLMs; Video Understanding
               
               paper
               
               website
              
             | 
          
            
             
           | 
          
              Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
               
             Jameel Hassan, Hanan Gani, Noor Hussein, Uzair Khattak, Muzammal Naseer, Fahad Khan, Salman Khan
               
              NeurIPS, 2023
               
            Vision Language models; Test-time adaptation; Light weight adaptation
               
               paper
               
               website
              
             | 
          
          
             
           | 
          
              Language aligned 3D instance segmentation
             
              Analyzed and established the potential of language supervision using CLIP for 3D point-cloud instance segmentation
                and its effectiveness for open-world setting.
             
            Vision Language models; 3D instance segmentation
             
             code
               
               poster
              
             | 
          
          
             
           | 
          
              Language as an adversary for CLIP
             
              Designed an adversarial attack on the CLIP model using language itself as an adversary
                .
             
            Vision Language models; Adversarial robustness
               
               code
             
             report
             
             poster
              
             | 
          
          
             
           | 
          
              YOLOngv8: From Imbalanced to Accurate Object Detection in Long Tailed iSAID Dataset
             
              Adapted the YOLOv8 model for long tail object detection on the iSAID aerial images dataset
                , using a prototype based contrastive loss with dynamic calibration for long tail nature of dataset.
               
            Long tail distribution; Object detection
               
               code
              
             | 
          
          
             
           | 
          
              Parallel and Distributed ViT
               
              Designed a parallel and distributed implementation of the ViT-B/16 architecture
              with a pipeline paralleled version, combined with distributed data parallel strategy.
               
            Parallel and Distributed Machine Learning, Pipeline parallelism
               
               code
              
             | 
            
          
             
           | 
          
              Video Face Restoration
             
              [Discontinued due to resource constraints]. Attempted an architectural design to lift 2D image face restoration
              models to video restoration whilst retaining the image restoration model.
               
            Video restoration
               
              
             |