Originally coined in the 90s, data mining has a rather lengthy and controversial past, especially as society entered the Digital Age. Since its inception, data mining has become a cornerstone of machine learning and statistics. Make no mistake – the process of discovering and extracting patterns from data has been around long before the term ‘data mining’ was given; however, the need for data mining has increased tenfold as a result of advances in resources and the sheer volume of data being produced.
What is Data Mining?
As mentioned above, the objective of data mining is to collect data about users and analyse it to draw conclusions about that user collective.
In order to collect and perform an analysis of the data, a range of methods are used involving statistical techniques, artificial intelligence and machine learning. Some of the methods used include grouping together records that are similar, as well as detecting relationships between records or variables. For example, grouping together customers based on their location and spending habits may reveal which areas are most likely to buy certain products compared to other areas.
Other methods are focused on identifying patterns within the data to make estimations and predictions for future events. These approaches consider structures such as decision trees, whereby each branch represents a possible event, as well as learning models paired with algorithms and rules that make a decision based on previous experiences until an outcome is reached. A conventional example would be using one’s credit history and credit score to determine the probability of paying a debt or loan.
Ultimately, depending on the quantity and quality of data collected, there is boundless potential and scope to discover relevant predictions and analysis. This has been made possible by the relentless innovations in computer processing, which has facilitated the shift from manual trials towards automated data mining.
How is Data Mining being used?
Data is being collected at an exponential rate. Information about almost anything online is stored in databases and used to mine relevant data that will help businesses with things such as marketing campaigns and customer relationships. You probably never realised it, but there are thousands of different ways your data is mined. Even something as little as connecting to a department store’s WIFI can have a huge impact on the data that company can gather from you.
People often talk about Instagram showing them an advertisement in their feed about something they were interested in, but they have no idea how Instagram knows to show that specific advertisement. This is a classic example of the way your data is being mined and harvested to create targeted campaigns that appeal to your interests. According to its data policy, Instagram, as well as its parent company, Facebook, uses data mining to consider the posts you like and people you follow, as well as your interactions with third-party apps and websites, to show specific advertisements that aren’t generic but tailored to you. Admittedly, terms and conditions are long-winded and, sometimes, very hard to read. However, they are actually super important in helping to understand what kind of data a company is collecting from you; it may be things that you didn’t even realise they could access, such as location-related information, device information, such as cookies, your phone attributes, your ISP, your contacts and many more.
The retail industry also mines your data from their large customer databases. These hold significant insights about customers, such as transactions and profiles. Transactions will inform about product popularity, preferences, and even whether certain areas are more prone to buying certain products than other areas. Customer profiles are crucial as they will provide key data about the type of audience being attracted, whether that is geographical or demographical. From this, retail companies are able to optimise their marketing campaigns to target certain customers while also being able to forecast sales and improve upon their customer relationships by delivering a personal touch based on the mined data.
Mining algorithms also aid the banking sector with extracting applicable data from the billions of transactions within the financial system. This also provides an understanding of the customer base by being able to notify a customer, based off of their previous transactions, if they suspect fraudulent activities in their account. Overall, data mining gives a bird’s eye view of the market risks and how to get the best returns on investments.
Data mining also holds a place within areas of science such as genetics, bioinformatics and medicine. Mining DNA sequences has helped to address the mapping of relationships between variations of DNA sequences and how this affects a person’s likelihood of being diagnosed with a common disease, such as cancer. Therefore, it has proven essential to providing diagnoses as well as prevention and treatment of a disease. As well as this, using mining algorithms has supported the automated analysis of clinical trial data to provide relevant biomedical data.
The dark side of Data Mining
With regards to the way data mining has been used by individuals, questions have been raised about privacy and ethical concerns as well as the protections in place for consumers. As well as this, the quality of data might be unreliable or presented in many different ways which results in incomplete or inaccurate data.
Ethics surrounding data mining has become a contentious issue. It became a “buzzword” when it was revealed that Cambridge Analytica had paid Facebook for access to data about over 50 million Americans to mine for political purposes. This opened up the world of data mining and the wider issues surrounding ethics whereby data is acquired through nefarious means and personal or unauthorised data is collected. If a person is unaware of what data is being collected, then they have no opportunity to consent to the collection and usage. Privacy comes into play when the acquired data, even if it is anonymised, is enough to identify specific individuals. For example, in 2011, Walgreens was sued for selling information about prescriptions to data mining companies who then provided the mined data to pharmaceutical companies.
Ultimately, while data mining is a crucial technology for many industries, companies need to think about the legality of the technology, the ethics of how it is being used as well as the privacy of the users whose data is being collected. After facing the outcome of scandals, such as Cambridge Analytica and Facebook, we need to think about whether any of our digital actions will ever be private and how legislation must move forward to protect the rights of consumers.