CSE498, Collaborative Design, Spring 2020
Computer Science and Engineering
Michigan State University

General Motors (GM) is a multinational automotive manufacturer headquartered in Detroit, Michigan. GM is ranked #13 on the Fortune 500 for total revenue and is the largest auto manufacturer headquartered in the United States.

Maintaining strong information security is a priority for GM to protect sensitive information that could compromise asset security and communication privacy. Publicly visible credentials grant unauthorized parties the opportunity to infiltrate GM assets and view private communication networks.

Our Open Source Intel system automates the discovery of security threats by collecting and analyzing information from various public code repositories on the internet such as GitHub, GitLab and Bitbucket.

Confidential intellectual property (IP) such as GM usernames, API keys and code snippets are displayed on a user-friendly web application. When a threat is discovered, relevant information about the IP leak is displayed so that GM employees can quickly act to mitigate the threat.

A machine learning service gives each discovered leak a confidence score. If a threat is assigned a high enough confidence score, employees are notified via text message and/or email.

Open Source Intel automates the currently manual process of discovering the warning signs of a leak and drastically increases employee effectiveness by letting them focus on threat mitigation instead of threat discovery.

The Python data collection pipeline is orchestrated using Celery, pipeline data is stored temporarily in Redis, and code is processed using PyDriller. A trained scikit-learn machine learning model quantifies each hit discovered by the pipeline. Open Source Intel stores data in a PostgreSQL database. This database then feeds the Python Django web application for display.