Data exposure through paste sites

The Underestimated Cyber Threats Lurking in Paste Sites

Paste sites are simple online services that allow users to store and share plain text. These web applications first gained popularity with developers who needed an easy way to share code snippets. Over time, they have grown into a ubiquitous tool used for various purposes, ranging from collaborative document editing to data storage. Among the multitude of paste sites, Pastebin is perhaps the most famous, but there are more than 50 similar services available, each with its own unique features.

A crucial feature of paste sites is their ability to generate either public or private ‘pastes’. Public pastes are visible to anyone, indexed by search engines, and often listed in the site’s public directory. On the other hand, private pastes are only accessible to those with the direct URL and are not indexed by search engines or listed publicly.

Despite their usefulness for legitimate purposes, paste sites have also become a hotbed for cybercriminal activities. Their public, anonymous nature and the ability to share large amounts of data quickly and easily make them attractive for a variety of malicious uses.

Cyber Criminals and Paste Sites

Cybercriminals leverage paste sites in a multitude of ways. Here are some examples:

1. Advertise Services: Just as a legitimate business would advertise their services, cybercriminals use paste sites to offer their illicit services, such as DDoS attacks for hire, trading in stolen data, or providing hacking tools.

2. Data Dumping Grounds: After successful cyber-attacks, criminals often dump stolen data onto paste sites. This could be anything from user credentials, credit card information, to more sensitive personal or corporate data. For example, in 2014, a large number of Snapchat usernames and phone numbers appeared on Pastebin after a major data breach.

3. Communication Channels: Hacktivist groups like Anonymous have been known to use paste sites to disseminate information about their actions and future plans.

Employees Accidentally Exposing Data

Unfortunately, it’s not just cybercriminals who are causing problems with paste sites. Employees can also pose a significant risk by accidentally exposing sensitive company data. Here are two common scenarios:

1. Storing Data Temporarily: An employee might use a paste site to temporarily store a piece of data. They may intend to create a private paste but accidentally make it public or forget to delete the paste after use.

2. Sharing Code Snippets: Developers often use paste sites to share code snippets. If they’re not careful, they might inadvertently include sensitive information, such as API keys or database credentials, in the shared code.

The Importance of Monitoring Paste Sites

Given the high potential for sensitive data exposure and the rampant misuse by cybercriminals, monitoring paste sites should be a critical part of any organization’s threat intelligence strategy.

By keeping an eye on these sites using our paste site monitoring, organizations can:

1. Detect Data Leaks: If an employee accidentally exposes data or a cybercriminal dumps stolen data, early detection can help minimize the damage. By discovering the leak early, the organization can take immediate remedial action, such as changing passwords, informing affected parties, and reinforcing security measures.

2. Gain Insight into Potential Threats: Monitoring cybercriminal activities on paste sites can provide valuable insights into emerging threats. For instance, if a criminal advertises a new type of attack, companies can preemptively strengthen their defenses.

3. Track Hacktivist Actions: If an organization finds itself in the crosshairs of a hacktivist group, monitoring paste sites can provide insights into the group’s plans, enabling proactive defensive measures.

Monitoring Paste Sites in Kaduu

You can monitor Paste Sites in 2 different ways in Kaduu:

  • Using a Google based Search Approach (https://deepweb.leak App)
  • Using a direct Paste Site Scraping Approach (https://control.leak App)

Google Based Search Approach

The technique we use in deepweb.leak app is slightly different to control.leak app and will catch different results. In this platform we offer the ability to use custom google queries to find your keyword in combination with paste sites and a direct API connection to Pastebin. You see under “sources” on the result page which technique was used to grab the according result.

Crawler Based Approach

We use a simple http crawler for the following pages, indexing them on a daily base in a DB:

  1. http://codepad.org
  2. https://dumpz.org
  3. https://gist.github.com
  4. https://kpaste.net
  5. https://pastebin.com
  6. http://pastebin.fr
  7. https://pastebin.pl
  8. http://paste.org.ru
  9. https://paste.opensuse.org

These pages publish their latest pastes on their website, allowing us to index them.

Please enter you search term under the navigation item “pastebin”. You could for example search for pwd AND jpmorgan and you will see all data that contains BOTH search terms in the same result:

In general we recommend you start monitoring your company name and domain to start with. If your Company is aclled bank365 and your domain is bank365.com then you could create seperate queries for both words. Of course you can monitor anything that seems to be a valuable asset (a patent name, a brand or a person)

Search Syntax

On the Kaduu search page you can search in a database of indexed pastebin documents. Usually pastebin-like websites are used to share code snippets, logs, stack traces, and other pieces of technical information. These text pieces may contain sensitive information related to your organization.

The index is updated every minute using automated crawlers.

Available Fields:

FieldDetails
createdAtCreation date & time.
publishedAtPublish date & time.
textPaste text (default field).
urlPaste document URL.
titlePaste title.
sourceIdSource ID, where the paste has been found.

Detailed Syntax:

FieldDetails
testSearch pastes containing test as a separate word or as a part of other word (delimited by punctuation characters). The following will match: test@gmail.com, test.love@mail.com, god_test@nice.org, “this is a test data”, hey@test.org, bye@test-data.org.
test.comSearch pastes containing test.com as a separate word or as a part of other word (delimited by punctuation characters). The following will match: boss@test.com, hr@this-is-test.com, test.com, data.test.com, super-test.com.
john@test.comSearch pastes containing john@test.com email. The search will only match that exact email and nothing else.
@test.comSearch pastes containing emails on test.com domain.
test AND sourceId:158dd4b2-7672-3492-95f6-019479cb4552XXXXXXXXX
createdAtSearch pastes containing test, in source with ID 158dd4b2-7672-3492-95f6-019479cb4552.
“bank hack”~2Search pastes using a fuzzy search. The matching paste should contain bank word, followed by hack word within 2 words distance.
quick brownSearch for quick or brown in paste text. This is the equivalent of quick OR brown search query.
quick OR brownSearch for quick or brown in paste text. OR keyword is case-sensitive. This is the equivalent of quick brown search query.
quick AND brownSearch for quick and brown – the paste should have both. AND keyword is case-sensitive.
quick AND NOT brownSearch for pastes containing quick and not brown. AND and NOT keywords are case-sensitive.
quick -brownSearch for pastes, with quick and containing no brown. This is the equivalent of quick AND NOT brown query.
createdAt:2020-03-05Search for pastes indexed on 5th of March, 2020.
createdAt:[2019-01-01 TO 2020-01-01]Search for pastes created between 1st of January, 2019 and 1st of January, 2020.
createdAt:[* TO 2020-01-01]Search for pastes created until 1st of January, 2020.