Big data is a term applied to data sets whose size or type is beyond the ability of traditional relational databases to capture, manage and process the data with low latency.
Big data is defined as large amount of data which requires new technologies and architectures to make possible to extract value from it by capturing and analysis process.
• Big Data is similar to small data, but bigger in size.
• but having data bigger it requires different approaches: - Techniques, tools and architecture
• an aim to solve new problems or old problems in a better way
• Big Data generates value from the storage and processing of very large quantities of digital information that cannot be analyzed with traditional computing techniques.
Big Data analytics is the process of collecting, organizing and analyzing large sets of data (called Big Data) to discover patterns and other useful information. Big Data analytics can help organizations to better understand the information contained within the data and will also help identify the data that is most important to the business and future business decisions. Analysts working with Big Data typically want the knowledge that comes from analyzing the data. The process of converting large amounts of unstructured raw data, retrieved from different sources to a data product useful for organizations forms the core of Big Data Analytics. Big Data is so difficult to store, collect, maintain, analyze and visualize.
The term Big Data refers to a huge volume of data that can not be stored processed by any traditional data storage or processing units. Big Data is generated at a very large scale and it is being used by many multinational
companies to process and analyse in order to uncover insights and improve the business of many organisations.
Data Volume: The Big word in Big data itself defines the volume. At present the data existing is in petabytes and is supposed to increase to zettabytes in nearby future. Data volume measures the amount of data available to an organization, which does not necessarily have to own all of it as long as it can access it.
Data Velocity: Velocity in Big data is a concept which deals with the speed of the data coming from various sources. This characteristic is not being limited to the speed of incoming data but also speed at which the data flows and aggregated.
Data Variety: Data variety is a measure of the richness of the data representation – text, images video, audio, etc. Data being produced is not of single category as it not only includes the traditional data but also the semi structured data from various resources like web Pages, Web Log Files, social media sites, e-mail, documents.
Data Value: Data value measures the usefulness of data in making decisions. Data science is exploratory and useful in getting to know the data, but “analytic science” encompasses the predictive power of big data. User can run certain queries against the data stored and thus can deduct important results from the filtered data obtained and can also rank it according to the dimensions they require. These reports help these people to find the business trends according to which they can change their strategies.
Complexity: Complexity measures the degree of interconnectedness (possibly very large) and interdependence in big data structures such that a small change (or combination of small changes) in one or a few elements can yield very large changes or a small change that ripple across or cascade through the system and substantially affect its behavior, or no change at all.
The challenges in Big Data are usually the real implementation hurdles which require immediate attention.
Any implementation without handling these challenges may lead to the failure of the technology implementation and some
1 Privacy and Security
It is the most important challenges with Big data which is sensitive and includes conceptual, technical as well as legal significance. • The personal information person when combined with external large data sets, leads to the inference of new facts about that person and it’s possible that these kinds of facts about the person are secretive and the person might not want the data owner to know or any person to know about them. • Information regarding the people is collected and used in order to add value to the business of the organization. This is done by creating insights in their lives which they are unaware of. • Another important consequence arising would be Social stratification where a literate person would be taking advantages of the Big data predictive analysis and on the other hand underprivileged will be easily identified and treated worse. • Big Data used by law enforcement will increase the chances of certain tagged people to suffer from adverse consequences without the ability to fight back or even having knowledge that they are being discriminated.
2 Data Access and Sharing of Information
If the data in the companies information systems is to be used to make accurate decisions in time it becomes necessary that it should be available in accurate, complete and timely manner. This makes the data management and governance process bit complex adding the necessity to make data open and make it available to government agencies in standardized manner with standardized APIs, metadata and formats thus leading to better decision making, business intelligence and productivity improvements. Expecting sharing of data between companies is awkward because of the need to get an edge in business. Sharing data about their clients and operations threatens the culture of secrecy and competitiveness.
3 Analytical Challenges
The main challenging questions are as:
• What if data volume gets so large and varied and it is not known how to deal with it?
• Does all data need to be stored?
• Does all data need to be analyzed?
• How to find out which data points are really important?
• How can the data be used to best advantage?
Big data brings along with it some huge analytical challenges. The type of analysis to be done on this huge amount of data which can be unstructured, semi structured or structured requires a large number of advance skills. Moreover the type of analysis which is needed to be done on the data depends highly on the results to be obtained i.e. decision making. This can be done by using one of two techniques: either incorporate massive data volumes in analysis or determine upfront which Big data is relevant.
4 Human Resources and Manpower
Since Big data is at its youth and an emerging technology so it needs to attract organizations and youth with diverse new skill sets. These skills should not be limited to technical ones but also should extend to research, analytical, interpretive and creative ones. These skills need to be developed in individuals hence requires training programs to be held by the organizations. Moreover the Universities need to introduce curriculum on Big data to produce skilled employees in this expertise.
5 Technical Challenges
1 Fault Tolerance : With the incoming of new technologies like Cloud computing and Big data it is always intended that whenever the failure occurs the damage done should be within acceptable threshold rather than beginning the whole task from the scratch. Fault-tolerant computing is extremely hard, involving intricate algorithms.
2 Scalability : The scalability issue of Big data has lead towards cloud computing, which now aggregates multiple disparate workloads with varying performance goals into very large clusters. This requires a high level of sharing of resources which is expensive and also brings with it various challenges like how to run and execute various jobs so that we can meet the goal of each workload cost effectively.
3 Quality of Data : Collection of huge amount of data and its storage comes at a cost. More data if used for decision making or for predictive analysis in business will definitely lead to better results.
4 Heterogeneous Data : Unstructured data represents almost every kind of data being produced like social media interactions, to recorded meetings, to handling of PDF documents, fax transfers, to emails and more. Working with unstructured data is cumbersome and of course costly too. Converting all this unstructured data into structured one is also not feasible. Structured data is always organized into highly mechanized and manageable way. It shows well integration with database but unstructured data is completely raw and unorganized.
The Big Data has numerous advantages on society, science and technology. It is unto the way that how it is used for the human beings. Some of the advantages are described below:
• Understanding and Targeting Customers
This is one of the biggest and most publicized areas of big data use today. Here, big data is used to better understand customers and their behaviors and preferences.
• Understanding and Optimizing Business Process Big data is also increasingly used to optimize business processes. Retailers are able to optimize their stock based on predictions generated from social media data, web search trends and weather forecasts. One particular business process that is seeing a lot of big data analytics is supply chain or delivery route optimization. HR business processes are also being improved using big data analytics.
• Improving Security and Law Enforcement
Big data is applied heavily in improving security and enabling law enforcement. The revelations are that the National Security Agency (NSA) in the U.S. uses big data analytics to foil terrorist plots (and maybe spy on us). Others use big data techniques to detect and prevent cyber-attacks. Police forces use big data tools to catch criminals and even predict criminal activity and credit card companies use big data use it to detect fraudulent transactions.
• Improving Healthcare and Public Health
The computing power of big data analytics enables us to decode entire DNA strings in minutes and will allow us to find new cures and better understand and predict disease patterns. Just think of what happens when all the individual data from smart watches and wearable devices can be used to apply it to millions of people and their various diseases. The clinical trials of the future won’t be limited by small sample sizes but could potentially include everyone.
• Optimizing Machine and Device Performance
Big data analytics help machines and devices become smarter and more autonomous. For example, big data tools are used to operate Google’s self-driving car. The Toyota Prius is fitted with cameras, GPS as well as powerful computers and sensors to safely drive on the road without the intervention of human beings. Big data tools are also used to optimize energy grids using data from smart meters. We can even use big data tools to optimize the performance of computers and data warehouses.
• Financial Trading
High Frequency Trading (HFT) is an area where big data finds a lot of use today. Here, big data algorithms are used to make trading decisions. Today, the majority of equity trading now takes place via data algorithms that increasingly take into account signals from social media networks and news websites to make buy and sell decisions in split seconds.