Domenic Curro

MSc Computer Science

  • Masters Thesis: Pose-Aware Embedding Networks and Multi-Modal Image-Language Retrieval

    Inspired by recent work in human pose metric learning this thesis explores a family of pose-aware embedding networks designed for the purpose of image similarity retrieval. Circumventing the need for direct human joint localization, a series of CNN embedding networks are trained to respect a variety of Euclidean and language-primitive metric spaces. Querying with imagery alone presents certain limitations and thus this thesis proposes a multi-modal image-language embedding space, extending the current model to allow for language-primitive queries. This additional language mode provides the benefit of improving retrieval quality by 3% to 14% under the hit@k metric. Finally, two approaches are constructed to address the issues of conducting partial language-primitive queries, with the former generating maximally likely descriptors and the latter exploiting the network’s tendency to factorize the embedding space into (mostly) linearly separable sub-spaces. These two approaches improve upon recall by 13% and 17% over the provided baselines.

    Paper Slides Data Code
  • Building Usage Profiles Using Deep Neural Nets

    To improve software quality, one needs to build test scenarios resembling the usage of a software product in the field. This task is rendered challenging when a product’s customer base is large and diverse. In this scenario, existing profiling approaches, such as operational profiling, are difficult to apply. In this work, we consider publicly available video tutorials of a product to profile usage. Our goal is to construct an automatic approach to extract information about user actions from instructional videos. To achieve this goal, we use a Deep Convolutional Neural Network (DCNN) to recognize user actions. Our pilot study shows that a DCNN trained to recognize user actions in video can classify five different actions in a collection of 236 publicly available Microsoft Word tutorial videos (published on YouTube). In our empirical evaluation we report a mean average precision of 94.42% across all actions. This study demonstrates the efficacy of DCNN- based methods for extracting software usage information from videos. Moreover, this approach may aid in other software engineering activities that require information about customer usage of a product.

    Paper Poster Slides Data Code
  • PyTorch Canny Edge Detector

    I was bored, so I made a Canny Edge Detector for the GPU using PyTorch.

  • Watson Analytics for Social Media Internship: OASIS

    Modern social media analytics tools analyze only the text of the posts, and discard the rest. Leveraging modern research, we (a small team of three devs, one business) tasked with: 1) Demonstrating that the entirety of the post is more valuable than the text alone, and 2) Generating actionable social media insights by automatically discovering important moments.

  • RU CP 8302 course project: DeepTutorial

    How do I set my default font in Microsoft Word? You can go to to watch a tutorial, but the author will drone on and on before they get to what you’re looking for. In this work, we demonstrate that a CNN can automatically find the key moment of interest in a video, using simple classification.

  • RU CP 8206 course project: Road Segmentation

    Being able to detect road for safe navigation in autonomous cars is an impor- tant problem in computer vision. Road scene segmentation is often applied in autonomous driving and pedestrian detection [7]. In this project, we apply deep learning techniques to train a convolutional neural network to segment road from non-road in images. We will be using the LeNet network as our model of choice[6], as defined in Caffe’s example MNIST Challenge solution.

  • UofT CSC 2515 Course Project: SVHN

    In this paper we attempt to obtain similar results to the state-of-the-art using a very well known and very simple Convolutional Neural Network architecture, to classify and further, to detect, house numbers from street level photos provided by the Street View House Number (SVHN) dataset. We also introduce an 11th class to the SVHN data set: background, to aid in the problem of detection.

  • G.T. Miami

    An attempt to recreate the magic of Grand Theft Auto I (before school got in the way…). The game is written in C++, using the SDL graphics engine and Box2D physics engine.

  • Other game projects

    A list of other game projects I have worked on in my spare time.