Files and Media are one of the more juicy targets to look for when planning a penetration test. For companies that publish things to the web on a regular basis, there is constantly information that is overlooked and should not have been sent out of the organization. I have found things like email distribution lists, Internal only email addresses perfect for phishing, personnel information, client communications, etc. Dont forget public facing FTP servers. They always seem to have something juicy hidden in them.
Documents.html is a tool that allows you to take a search term related to your target, and search for various file types associated with the term. The term should be something as unique as possible, but still related to the target: company name, platform, application, client, etc. Perform multiple searches for various terms for the best coverage.
http://www.uvrx.com/ - The most comprehensive online file storage search engine. They have individual search engines for badongo, Mediafire, Zshare, 4shared and taringa. They also provide a search all function that searches filefactory, depositfiles, easy-share, sharedzilla, sendspace, yousendit, letitbit, drop, sharebee, rapidspread, and many others.
PowerMeta - PowerMeta searches for publicly available files hosted on various websites for a particular domain by using specially crafted Google, and Bing searches. It then allows for the download of those files from the target domain. After retrieving the files, the metadata associated with them can be analyzed by PowerMeta. Some interesting things commonly found in metadata are usernames, domains, software titles, and computer names.
goofile - Use this tool to search for a specific file type in a given domain.
FilePhish - A simple OSINT Google query builder for fast and easy document and file discovery.
MetaFinder - Search for documents in a domain through Search Engines (Google, Bing and Baidu). The objective is to extract metadata
https://libgen.rs/ - This is the largest free library in human history. Giving the world free access to over 84 million scholarly journals, over 6.6 million academic and general-interest books, over 2.2 million comics, and over 381 thousand magazines. Commonly referred to as "Libgen" for short. Libgen has zero regard for copyright.
https://sci-hub.se/ - A "shadow library" that provides free access to millions of research papers and books by bypassing paywalls. SciHub has zero regard for copyright.
https://the-eye.eu/public - An open directory data archive dedicated to the long-term preservation of any and all data including websites, books, games, software, video, audio, other digital-obscura and ideas. Currently hosts over 140TB of data for free.
https://eyedex.org - A searchable index of the-eye.eu. Much faster than manually digging through subfolders or using Google dorks.
https://doaj.org - Search over 16,000 journals, over 6.5 million articles in 80 different languages from 129 different countries.
https://www.slideshare.net/ - Allows users to upload content including presentations, infographics, documents, and videos. Users can upload files privately or publicly in PowerPoint, Word, PDF, or OpenDocument format
While having other functionality Michael Bazzell's Images.html and Videos.html tool helps search for image terms across multiple platforms. Looking for faces of employees? Maybe a picture of their security badge you can copy? Image of a target you can extract metadata from later? Start with a good image search. Google is hard to beat for this but there are other platforms that can lead to some interesting discoveries.
*Note - this tool is only to find images associated with a search term. If you have an image and you would like to find out more information about it, that will be discussed under the Forensics section.
Looking for easy creds? Linked data? Password hash? Breaches can be a trove for low hanging fruit for those targeting those not diligent with their cyber hygiene. Often times, the credentials found in large data breaches will turn into password lists such as the infamous rockyou.txt password list that came from a sizeable breach in 2009.
The below tools and links can be used to parse data in known data breaches and leaks, or be used for detection and alert for the presence of credentials when new breach data is reported.
Paste sites like Pastebin have recently changed their ability to be parsed. Pastebin itself has removed the ability to to search its pastes. However, with a bit of clever google dorking, you can still search for breach data by submitting your search along with "insite:pastebin.com"
https://www.dehashed.com/ - Premium but well worth it, Breach data site. Can search by multiple types of indicators like email, IP, address, domain, even password.
Scylla - One of the greatest breach parsing tools available.
https://leak-lookup.com/ - Leak-Lookup allows you to search across thousands of data breaches to stay on top of credentials that may have been compromised, allowing you to proactively stay on top of the latest data leaks with ease. AKA Citadel
https://breachdirectory.org - Search via email address, username or phone number to see censored passwords. They also provide the full password as a SHA-1 hash, which can easily be cracked.
https://mypwd.io/ - A tool for monitoring leaked passwords for accounts linked to emails. Actually shows you the leaked passwords.
https://doxbin.org - A document sharing and publishing website which invites users to contribute personally identifiable information (PII), or a "dox" of any person of interest. It previously operated on the darknet as a TOR hidden service.
Firefox Monitor - Great tool for searching if your accounts have been found in a breach and can alert you when new breaches are discovered and parsed.
pwd query - Check if your passwords have been compromised from a data leak...
Analysis Information Leak framework - AIL is a modular framework to analyze potential information leaks from unstructured data sources like pastes from Pastebin or similar services or unstructured data streams.
breach-parse - A tool for parsing breached passwords by The Cyber Mentor. Repo also contains large breach data collections.
Ah the gold mine of git repositories. So at the time of writing this, we are still in the golden age of security ignorance in coding. DevSecOps has not yet fully caught on, and software engineers everywhere post up this tid-bits of insecure code for storage later, or post a bit if their config file on a forum asking for help. Little did they realize that in that bit of the config file, they accidentally posted their creds! These are a few examples of the fun things we can find when checking code repositories. Now searching for these is usually limited to the context of a penetration test against an organization where you know they have software engineers bust creating the next great thing.
There are many great options out there for code repositories, but there are 4 that are the gold standard for checking.
Gitrob - Gitrob is a tool to help find potentially sensitive files pushed to public repositories on Github.
Git all secrets - Clone different gits and automatically scan them for secrets.
Truffle Hog - Searches through git repositories for secrets, digging deep into commit history and branches. This is effective at finding secrets accidentally committed.
gitleaks - This package contains a SAST tool for detecting hardcoded secrets like passwords, API keys, and tokens in git repos. Gitleaks aims to be the easy-to-use, all-in-one solution for finding secrets, past or present, in your code.
GitDorker - A Python program to scrape secrets from GitHub through usage of a large repository of dorks.