Encyclopedia Marketing. Big Data and Blockchain - Breakthrough in Data Analysis Big Data Examples of Use

Volkova Julia Sergeevna, 4-year student, financial university under the Government of the Russian Federation, Kaluga branch, Kaluga [Email Protected]

Big data in the modern world

Annotation. The arts are devoted to the introduction of large data technologies in our modern society. The main characteristics of large data were investigated, the main applications, such as banking sphere, retail, private and public sector, and even everyday life are considered. The study revealed shortcomings of the use of large data technologies. The need to develop the normative regulation of the use of large data are designated. Welcome words: large data, banks, banking sphere, retail, private sector, public sector.

As the degree of investigativeness of funds, information technologies are increasing the various directions of modern society increase and the requirements for their adaptiveness of dedication tasks that suggest huge amounts of data. There are such information that cannot be processed by traditional methods, including structured data, media and random objects. And if with the analysis of the first existing technologies today cope with the existing technologies, then the analysis of the second and third almost remains unbearable. Studies show that the volumes of media languages, such as video surveillance results, aerial photography, digital medical information, and random objects stored in numerous archives and clouds, increases the year. The study of large data is devoted to works of both foreign and Russian scientists: James Manyika, Michael Chui, Toporkov V.V., Buduzko V.I. The substantial storage in the study of this technology is made by major world companies, such as: McKinsey & Company, CNews Analytics, SAP, Oracle, IBM, Microsoft, TeraData and many others. Oni-by-making processing and analysis of data and on the basis of large data Create a program to the equipment. According to McKinsey Institute report: "Large data is a data set, the size of which goes beyond the capabilities of typical databases of software tools for capturing, storage, management and data analysis." In essence, the concept of large data involves working with the information of a huge volume and a variety of composition constantly updated and located in different sources in order to increase the efficiency of work, creating new products and increasing competitiveness. The consulting company Forrester gives a brief and fairly clear wording: "Big data methods and technologies that remove the meaning of the data on the extreme limit of practicality." Today, the largest data sector is characterized by the following signs: Volume-volume, the accumulated database is a large amount of information. .Velocity-speed, this feature indicates how the increasing rate of data accumulation (90% of the information has been collected over the past 2 years) .Variety-diversity, i.e. The possibility of simultaneous processing, structured by unstructured variance information. Experts from among marketers loved to add their "V" here. Ktoto says more about reliability (Veracity), others add that large data technology must certainly benefit business (Value). It is expected that by 2020 the accumulated amount of information on the planet will double every two years. The abundance of data raises the desire to use them for analysis and forecasting. Colossal volumes require relevant technologies. Today, companies should handle a tremendous amount of data in volumes that are difficult to submit, this leads to the fact that traditional databases cannot cope with such a task, and this leads to the need to implement large data technology. The comparative characteristic of large data and traditional databases is sieved. The basis for the formation of this table was the studies of Buduzko V. I. and the Moscow Exchange. Table 1 comparative characteristics of large data and traditional data

Traditional databases of productivity

One or more subject area is used by the use of generic data technology extensive. From identifying customer preferences Before analyzing risk spells. Datentable information strictly structured data arrays with a complex inhomogeneous and \\ or uncertain storage structure. Storage data and processing. Data-breeding model. Thus, the scope of the traditional databases covers only one or more, with such areas there must be contained by the structured data. As for large data, the scope of their application is extensive with huge arrays of information with a complex structure. According to the results of the study of CNews Analytics, presented in Figure 1, the Russian market comes to such a phenomenon as large data, which shows increasing the level of maturity of companies. Many firms are transferred to the technology of large data due to the volume of their processed data, already now more than 44% generate about 100 terabytes, and in 13% these volumes of data express 500 terabytes.

Fig.1. Volumes of information processed in companies

Such volumes cannot be processed by traditional databases, so such companies see the solution to the transition to large data not simply as processing huge volumes, but also as an increase in competitiveness, increasing the buyer's loyalty to their product and attracting new ones. The most active customers of such solutions are banks, telecom and retail, their percentage ratio is presented in Figure 2. The number of companies that use or will be ready to use large data in the transport industry and energy, industry are noticeable. The first examples of using large data appeared in the public sector.

Fig.2. Sectoral structure of the use of big data

As for the Western Government, diffident estimates, the digital economy is from3% to 21% of GDP countries of large twenty. The Russian Gossector has not yet achieved significant results in working with big data. Today in Russia, such technologies are mostly interested in commercial enterprises: trading networks, banks, telecommunication companies. Reception of Russian community communities, the volume of digital Economy VRF is only 1 trillion. rub. -Ocline 1.5% dismount. Nevertheless, the URF is a huge potential for the growth of the digital economy. Despite the small term of the existence of the Big Data sector, there are also estimates of the effective use of these technologies based on real examples. Banks today are on average by approximately 3.8 Petobite data, they use large data technology to achieve certain tasks: • Credit Card Using Data Card;  Putting data data;  Credit data; 44% 16% 13% 7% 20% BANKEKTELECOMETLGOSSEGORGERGIEBIE Customer's data collection data;  Customer saving data. Banks claim that after they began to enjoy large data technology, they were able to attract new customers, it is better to interact with both new and old customers and with old customers and Maintain their loyalty. In 2015, CNews Analytics survey among the thirties of the largest Russian banks on cumulative assets to find out which technology of large data they use and with what objectives. Compared to the survey of 2014, the number of banks of the TOP30, which reported on the use of generic data technologies increased, but this change is rather due to the change in the top 30 composition. Figure 3 has a comparison comparison of the survey 2015 compared to 2014, the equally survey of Kiryanovaya A.

Fig. 3. Using large data TOP30 Russian banks

According to IBS estimates, 80% of banks that answered positively, implement Big Data Appliance-software equipment for storing and processing data. These solutions usually act as an analytical or transactional repository, the main advantage of which is a good performance when working with large data volumes. However, the practice of using large data in Russian banks is at the stage of becoming. The reason for such a slow adaptation in Russia is shown by the cavity of the customers to new technologies. They do not feel confidence that big data technology will help to solve problems in full. And this is what the American market concerns, there banks have already accumulated 1 exam data, which can be compared with 275 billion MP3 entries. The number of sources from where the information comes from, extensively, you can highlight the classic:  Customers of the Bank's Office Customers; • Customer Customer Customer Customers; • Customer Social Networking;  Credit Card Operations Operations Other. Analyze the behavior of buyers, design routes of the trading room, properly dispose of goods, plan procurement, and, ultimately, increase sales. In Big Data ones, the Sales Mechanism itself is built on large data: users offer products based on previous purchases and their personal preferences, information about which is collected, for example, in social networks. In both cases, the analysis of large data helps to reduce costs, increase customer loyalty and cover a large audience. As the development of the trade potential of companies, the traditional database cease to meet the growing business requirements, due to which the system cannot provide adequate management accounting. Turning to large data, new technologies allow you to optimize the management of the shipping, achieve the relevance of the data and the efficiency of their processing of the consequences of management decisions, quickly form management reports. The total amount of accumulated data is more than 100 exbutte, with only Walmart using large data processes 2.5 petabytes of data per hour. When using large data technologies, 60% increases operational profitability, as well as the Hadoop statistic after the implementation of large data, the analytics performance increases to the treatment of 120 algorithms, and the profit is growing at 710%. The data is just beginning to gain momentum, since the discharge of information processing is very different. For example, it is 18 times less than in China, and the entire circulation of data, which is produced in the online buttons 4.5 times less than one Amazon store. At the same time, the number of online shops in Russia, which use large data less than 40 thousand, while Europe, the number of such shops is more than 550 thousand. What characterizes the Russian retail market as still developing and not fully formed. As for our daily life, the technologies of large data are used here, about which we did not even think about 1 million compositions every day, and this is about 1.5 ~ 2 petabytes, processes Shazam, music service, worldwide, and based on This then musical producers predict the popularity of the artist. Big data are also used to handle credit card information such as MasterCard and Visa. Thus, 65 billion transactions for the year with the help of 1.9 billion cards in 32 million trading firms processes MasterCard to predict trading trends. Every day, people write in social networks around the world, such as Twitter and Facebook, on 19 terabytes of data. They upload and process photos, write, send messages and so on. The infrastructure also uses large data technology, from the trolleybuses of the mines and rockets. Thus, in the London metro every day, the turnstiles record about 20 million passes, as a result of the analysis conducted on the basis of large data technologies, 10 all sorts of epicenter were identified, which is also taken into account with the further development of the subway. Undoubtedly, the diversity and amount of data arising from all sorts of interactions is a powerful business base for building and clarifying forecasts, identifying patterns, efficiency assessment, etc. However, in total there are assupports, which also need to competently take into account. Despite the explicit and potential advantages of using large data, their use has its drawbacks, which are primarily associated with large amounts of information, different methods of access to it and with often insufficient resource provision. Information security features in organizations. Problems that are related to the use of generic data are presented in Figure 4.

Fig. 4. Problems of using big data

All these problems lead to the fact that many companies with lags introduce large data technology, since when working with third parties, they themselves arise the problem of the disclosure of inside, which the company could not disclose using only resources. According to my opinion, the most important step On the path of complete introduction of technologies on the basis of large data, it must be a legislative aspect. Now there are laws that limit the collection, use, storage of certain types of personal data, but they do not limit full data, therefore special legislation should be existed for them. In order to fit rapidly changing and new laws, companies must fulfill the initial inventory of the relevant regulatory legal acts and on a regular basis to update this list. However, despite the above listed flaws, as the experience of Western representatives shows, big data technology helps to successfully solve As modern business exercise and increasing competitiveness and objectives associated directly with people's life. Russian companies are already on the way of introducing large data technologies both in the manufacturing sphere and in public, since the amount of information every year increases almost two. Over time, many areas of our life will be changed under the influence of large data.

Links to sources1.Budzkov. I. High Accessibility Systems and Big Data // Big Data in the National Economy of 2013. P. 1619.2. Corotkova T. "EMC Data Lake 2.0 - the transition to the analyst of large data and a digital economy" http://bigdata.cnews.ru/ News / Line / 20151203_EMC_Data_Lake_20_Pomozhet_perejti_k_analitike.3.Kyrianova A. "The big data did not become a mainstream in Russian banks" http://www.cnews.ru/news/top/bolshie_dannye_ne_stali_mejnstrimom 4.cnews "Infigure: Big data came to Russia" http : //bigdata.cnews.ru/articleshi/infografika_bolshie_dannye_prishli_v_rossiyu.5.cnews "Infographics: How retail uses big data" http://bigdata.cnews.ru/articles/infografika_kak_roznitsa_ispolzuet There are no special legislation in the world in relation to Big Data data should To be disguised in order to preserve the source data sources of these companies must be confident that all data security requirements are monitored and supported by the implementation of Big Datages may Whether to create or detect previously confidential information management. Maintaining data security requirements for the security of data Label Namident Risk6.cnews "Infographics: BigData Technologies" http://bigdata.cnews.ru/articles/big_data_v_zhizni_cheloveka.7.cnews Keywords: which can be large data in banks »http://bigdata.cnews.ru/articles/infografika_chto_mogut_bolshie_dannye.8. Moskovskaya Exchange" Analytical SummaryBigdata "http://habrahabr.ru/company/moex/blog/256747/9. Big data (BigData). http://www.tadviser.ru/index.php/statimateschet_data_(big_data ).10.Bigdata-Electricity XXIVEK http://bit.samag.ru/archive/article/1463.11.mckinsey global institute "bigdata: the NEXT FRONTIER FOR INNOVATION, COMPETTIONAND PRODUCTIVITY »(June 2011).

The Russian-speaking environment uses as a term Big Data.And the concept of "big data". The term "big data" is a carting of an English term. Big data do not have a strict definition. Do not have a clear border - is it 10 terabyte or 10 megabytes? The name itself is very subjective. The word "big" is like "one, two, many" in primitive tribes.

However, there is a well-established opinion that large data is a combination of technologies that are designed to perform three operations. First, process the volume of data large compared to the "standard" scenarios. Secondly, be able to work with rapidly incoming data in very large volumes. That is, the data is not just a lot, but they are constantly becoming more and more. Thirdly, they should be able to work with structured and poorly structured data in parallel in different aspects. Big data suggest that the input of the algorithms receive the stream of not always structured information and that one can extract more than one idea.

A typical example of large data is information coming from various physical experimental settings - for example, with which makes a huge amount of data and makes it constantly. The installation continuously displays large amounts of data, and scientists with their help decide parallel to many tasks.

The emergence of large data in the public space was due to the fact that these data were affected by almost all people, and not just a scientific community, where such tasks were solved for a long time. In the public sphere of technology Big Data. They went out when it was a question of a completely specific number - the number of the inhabitants of the planet. 7 billion collected in social networks and other projects that aggregate people. YouTube., Facebook., In contact withwhere the number of people is measured by billions, and the number of operations they make at the same time is huge. The data stream in this case is user actions. For example, the data of the same hosting YouTube.which overflow over the network in both directions. Under processing is understood not only interpretation, but also the ability to correctly process each of these actions, that is, put it in the right place and make this data to each user are available quickly, since social networks do not tolerate expectations.

Much of what relates large data, approaches that are used to analyze them, actually exist for quite a long time. For example, image processing from observation cameras when we are not talking about one picture, but about the data stream. Or navigating robots. All this exists dozens of years, simply now the tasks for data processing touched upon a much larger number of people and ideas.

Many developers are accustomed to working with static objects and think state categories. In large paradigm data is different. You must be able to work with an incessant data stream, and this is an interesting task. It affects more and more areas.

In our lives, more and more hardware and programs begin to generate a large amount of data - for example, "Internet of things."

Things are now generating huge information flows. The Police System "Stream" sends information from all cameras and allows you to find cars according to this data. Fitness bracelets, GPS trackers and other things serving the tasks of a person and business are increasingly included.

The Moscow Informatization Department is gaining a large number of analysts of data, because statistics on people accumulates a lot and it is a multi-criteria (that is, about each person, about each group of people are collected statistics on a very large number of criteria). In this data it is necessary to find regularities and trends. For such tasks, mathematics with IT education are needed. Because ultimately, the data is stored in structured DBMS, and you must be able to contact them and receive information.

Previously, we did not consider big data as a task for the simple reason that there was no place for storage and there were no networks for their transfer. When these opportunities appeared, the data immediately filled with them the volume provided to them. But no matter how expanding the bandwidth and ability to keep the data, there will always be sources, admit, physical experiments, experiments on modeling the wing streaml, which will produce information more than we can convey. According to the law of Moore, the performance of modern parallel computing systems increases steadily, growing and rates of data networks. However, the data should be able to quickly save and remove from the carrier (hard disk and other types of memory), and this is another task in the processing of large data.

Preface

Big Data is a fashionable current term that appears in almost all professional conferences on data analysis, prognostic analytics, intelligent data analysis (Data Mining), CRM. The term is used in areas where work is relevant with qualitatively large amounts of data, where the data flow rate is constantly increasing in the organizational process: economics, banking, manufacturing, marketing, telecommunications, web analytics, medicine, etc.

Together with the rapid accumulation of information, data analysis technologies are also developing with rapid pace. If a few years ago it was possible, let's say, just segment customers into groups with similar preferences, it is now possible to build models for each customer in real time, analyzing, for example, its movement on the Internet to search for a specific product. The consumer interests can be analyzed, and in accordance with the constructed model, suitable advertising or specific suggestions are derived. The model can also be configured and rebuilt in real time, which was unthinkable a few years ago.

In the field of telecommunications, for example, technologies are developed to determine the physical location of cell phones and their owners, and it seems that soon will be a reality idea described in the science fiction film "Special Opinion", 2002, where the display of advertising information in shopping centers Considered the interests of specific persons passing by.

At the same time, there are situations when the passion for new technologies can lead to disappointment. For example, sometimes rarefied data ( Sparse Data.), giving an important understanding of reality, are much more valuable than Big data (Big Data), describing mountains, often not essential information.

The purpose of this article is to clarify and think about the new features of Big Data and illustrate as an analytical platform Statistica. Statsoft can help with efficient Big Data to optimize processes and solving problems.

How big is Big Data?

Of course, the correct answer to this question should sound - "It depends ..."

In modern discussions, the concept of Big Data is described as the volume of the volume in the Terabyte orders.

In practice (if we are talking about gigabytes or terabytes), such data is easy to store and manage them using the "traditional" databases and standard equipment (database server).

Software Statistica. Uses multi-threaded technology for data access control algorithms (read), transformation and constructing prognostic (and scoring) models, so such data samples can be easily analyzed, and do not require specialized tools.

Some current projects STATSOFT are processed samples of about 9-12 million rows. Multiply them on 1000 parameters (variables) collected and organized in the data warehouse to build risky or prognostic models. This kind of file will have the volume "only" about 100 gigabytes. This, of course, is not a small data storage, but its size does not exceed the capabilities of standard database technology.

Product line Statistica. For batch analysis and building scoring models ( Statistica Enterprise.), solutions operating in real time ( Statistica Live Score), and analytical tools for creating and controlling models ( Statistica Data Miner, Decisioning) Easily scaled into several servers with multi-core processors.

In practice, this means that sufficient speed of the analytical models (for example, forecasts for credit risk, the probability of fraud, reliability of equipment nodes, etc.) allowing to make operational solutions, almost always can be achieved using standard tools Statistica..

From large amounts of data to Big Data

As a rule, the Big Data discussion is concentrated around the data warehouses (and conducting analysis based on such storage), much more than just a few terabytes.

In particular, some data warehouses can grow to a thousand terabyte, i.e., to the petabyte (1000 terabyte \u003d 1 petabyte).

Outside the petabyte, the accumulation of data can be measured in exabytes, for example, in the production sector around the world in 2010, estimated that a total of 2 exhaba new information was accumulated (Manyika et al., 2011).

There are industries where data is collected and accumulated very intensively.

For example, in the production sector, such as a power plant, a continuous flow of data is sometimes generated for tens of thousands of parameters every minute or even every second.

In addition, over the past few years, the so-called "Smart Grid" technologies are being introduced, allowing utilities to measure electricity consumption by individual families every minute or every second.

For this kind of applications in which the data should be stored for years, the accumulated data is classified as Extremely Big Data.

The number of Big Data applications among commercial and public sectors is growing, where the amount of data in the storages can be hundreds of terabytes or petabytes.

Modern technologies allow you to track people and their behavior in various ways. For example, when we use the Internet, make purchases in online stores or large stores, such as Walmart (according to Wikipedia, Walmart data warehouse is rated in more than 2 petabytes), or move with mobile phones included - we leave the trace of our actions that leads to the accumulation of new information.

Various ways of communication, from simple phone calls before downloading information through social networking sites, such as Facebook (according to Wikipedia, information exchange every month is 30 billion units), or video sharing on site such sites like YouTube (YouTube claims that he Loads 24 hours of video every minute; see Wikipedia), every day generate a huge number of new data.

Similarly, modern medical technologies generate large amounts of data related to the provision of medical care (images, video, real-time monitoring).

So, the classification of data volumes can be represented as follows:

Large data sets: from 1000 megabytes (1 gigabyte) to hundreds of gigabytes

Huge data sets: from 1000 gigabytes (1Teable) to several terabytes

Big Data: from several terabytes to hundreds terabyte

Extremely Big Data: from 1000 to 10,000 terabytes \u003d from 1 to 10 petabytes

Big Data Tasks

There are three types of tasks associated with Big Data:

1. Storage and management

The amount of data in hundreds of terabytes or petabyte does not allow easy to store and manage them using traditional relational databases.

2. Unstructured information

Most of all Big Data data are unstructured. Those. How can you organize text, video, images, etc.?

3. Big Data Analysis

How to analyze unstructured information? As based on Big Data, make simple reports, build and implement in-depth prognostic models?

Storage and management Big Data

Big Data is usually stored and organized in distributed file systems.

In general terms, the information is stored on several (sometimes thousands) hard drives, on standard computers.

The so-called "card" (MAP) tracks where (on which computer and / or disk) is stored a specific part of the information.

To ensure fault tolerance and reliability, each part of the information is usually maintained several times, for example - three times.

So, for example, suppose you have collected individual transactions in a large retail chain of stores. Detailed information about each transaction will be stored on different servers and hard drives, and "Map" (MAP) indexes exactly where information about the relevant transaction is stored.

Using standard hardware and open software to manage this distributed file system (for example, Hadoop.), it is relatively easy to implement reliable data warehouses on the scale of PETABY.

Unstructured information

Most of the information collected in a distributed file system consists of unstructured data, such as text, images, photos or videos.

It has its advantages and disadvantages.

The advantage is that the ability to store large data allows you to save "all data" without worrying about which part of the data is relevant for subsequent analysis and decision making.

The disadvantage is that in such cases, the subsequent processing of these huge data arrays is required to extract useful information.

Although some of these operations may be simple (for example, simple calculations, etc.), others require more complex algorithms that should be specifically designed for efficient operation on a distributed file system.

One top manager once told Statsoft that he "spent a whole state on IT and storing data, but still did not start receiving money," because I did not think about how best to use this data to improve the main activity.

So, while the amount of data can grow in geometric progression, the ability to extract information and act on the basis of this information is limited and will be asymptotically reaching the limit.

It is important that methods and procedures for building, updating models, as well as to automate the decision-making process, are developed along with storage systems to ensure that such systems are useful and profitable for the enterprise.

Big Data Analysis

This is really a big problem associated with the analysis of unstructured Big Data data: how to analyze them with benefit. This question is written much less than on the storage of data and the Big Data management technologies.

There are a number of questions that should be considered.

Map-Reduce.

When analyzing, hundreds of terabytes or petabytes of data, it is not possible to extract the data into any other place for analysis (for example, in Statistica Enterprise Analysis Server).

The process of transferring data via channels to a separate server or server (for parallel processing) will take too much time and requires too large traffic.

Instead, analytical calculations must be performed physically close to the place where the data is stored.

The MAP-Reduce algorithm is a model for distributed computing. The principle of its work is as follows: the input data is distributed to the working units (Individual Nodes) of the distributed file system for pre-processing (MAP step) and, then, a convolution (merge) of the previously processed data (Reduce step).

Thus, let's say, to calculate the total amount, the algorithm will parallel to calculate intermediate sums in each of the nodes of the distributed file system, and then summarize these intermediate values.

The Internet has a huge amount of information about how different calculations can be performed using the MAP-REDUCE model, including for predictive analytics.

Simple Statistics, Business Intelligence (BI)

To compile simple BI reports, there are many open source products that allow calculating amounts, medium, proportions, and the like. Using Map-Reduce.

Thus, get accurate calculations and other simple statistics to compile reports is very easy.

Forecast modeling, in-depth statistics

At first glance, it may seem that building prognostic models in a distributed file system is more difficult, but it is not so. Consider the preliminary stages of data analysis.

Preparation of data. Some time ago, Statsoft has conducted a series of major and successful projects with the participation of very large sets of data describing the foil indicators of the process of the power plant. The purpose of the analysis was to improve the efficiency of the power plant and decrease the amount of emissions (Electric Power Research Institute, 2009).

It is important that, despite the fact that data sets can be very large, the information contained in them has a significantly lower dimension.

For example, while the data is accumulated every second or every minute, many parameters (gases and furnace temperatures, flows, dampers position, etc.) remain stable at large time intervals. In other words, the data recorded in every second are mainly repetitions of the same information.

Thus, it is necessary to conduct "smart" data aggregation to be obtained for modeling and optimizing data that contains only the necessary information about dynamic changes affecting the efficiency of the power plant and the amount of emissions.

Classification of texts and pre-processing of data. We will illustrate again as large data sets may contain much less useful information.

For example, Statsoft participated in text analysis projects (Text Mining) from twists reflecting how passes are satisfied with airlines and their services.

Despite the fact that a large number of appropriate tweets, moods expressed in them were extracted every day, were quite simple and monotonous. Most messages are complaints and brief messages from one sentence about "bad experience." In addition, the number and "strength" of these sentiments are relatively stable in time and in specific issues (for example, lost baggage, poor nutrition, the abolition of flights).

Thus, the reduction in the actual tweets to the ski (estimates) of mood using Text Mining methods (for example, implemented in Statistica Text Miner.), leads to a much smaller data volume, which can then be easily associated with existing structured data (actual ticket sales, or frequently flying passenger information). The analysis allows you to break customers into groups and explore their characteristic complaints.

There are many tools for carrying out such aggregation of data (for example, soon miles) in a distributed file system, which makes it easy to carry out this analytical process.

Building models

Often the task is to quickly build accurate models for data stored in a distributed file system.

There are implementations of MAP-REDUCE for various Data Mining / Prognostic Analytics Algorithms, suitable for large-scale parallel data processing in a distributed file system (which can be supported using the platform Statistic StatSoft).

However, it is because of the fact that you have processed a very large amount of data, do you confident that the final model is really more accurate?

In fact, most likely, it is more convenient to build models for small data segments in a distributed file system.

As they say in the recent report forrester: "Two plus two equals 3.9 - this is usually good enough" (Hopkins & Evelson, 2011).

Statistical and mathematical accuracy is that a linear regression model comprising, for example, 10 predictors based on properly made probabilistic sample Of the 100,000 observations, it will also be accurate as a model built on 100 million observations.

Permanent acceleration of data growth is an integral element of modern realities. Social networks, mobile devices, data from measuring devices, business information are just a few types of sources that can generate gigantic data arrays.

Currently, the term Big Data (large data) has become quite common. Not still aware of how quickly and deep technology for processing large data arrays are changing the most different aspects of society. Changes occur in various spheres, generating new problems and challenges, including in the field of information security, where there should be such essential aspects such as confidentiality, integrity, availability, etc.

Unfortunately, many modern companies resort to Big Data technology without creating a proper infrastructure that could ensure reliable storage of huge data arrays that they collect and store. On the other hand, a blockchain technology is rapidly developing rapidly, which is designed to solve this and many other problems.

What is Big Data?

In essence, the definition of the term lies on the surface: "big data" means management of very large amounts of data, as well as their analysis. If looking wider, then this is information that cannot be processed by classical methods due to its large volumes.

The term Big Data itself appeared relatively recently. According to the Google Trends service, the active growth in the popularity of the term falls at the end of 2011:

In 2010, the first products and solutions directly related to the processing of large data were already appeared. By 2011, most of the largest IT companies, including IBM, Oracle, Microsoft and Hewlett-Packard, actively use the term Big Data in their business strategies. Gradually, analysts of the information technology market begin active studies of this concept.

Currently, this term has gained considerable popularity and is actively used in a wide variety of spheres. However, it is impossible to say with confidence that Big Data is some kind of fundamentally new phenomenon - on the contrary, large data sources have existed for many years. In marketing, they can call the databases for customers purchases, credit stories, lifestyle, etc. Over the years, analysts have used these data to help predict future customer needs, evaluate risks, form consumer preferences, etc.

Currently, the situation has changed in two aspects:

- there were more complex tools and methods for analyzing and mapping different data sets;
- Analysis tools were supplemented with many new data sources, which is due to the widespread transition to digital technologies, as well as new data collection and data measurement methods.

Researchers predict that Big Data technologies will be more active in production, health, trade, public administration and in other various fields and industries.

Big Data is not a certain data array, but a set of methods of their processing. The decisive characteristic for large data is not only their volume, but also other categories characterizing the time-consuming processing and data analysis processes.

As source data for processing, for example:

- logs of the behavior of Internet users;
- Internet of things;
- social media;
- meteorological data;
- digitized books of the largest libraries;
- GPS signals from vehicles;
- information on transactions of banks' customers;
- data on the location of mobile network subscribers;
- information on purchases in large retail networks, etc.

Over time, the amount of data and the number of their sources is growing continuously, and on this background new and already existing information processing methods appear.

Basic principles BIG DATA:

- Horizontal scalability - data arrays can be huge and this means that the processing system of large data should expand dynamically with an increase in their volumes.
- Flaud tolerance - even with the failure of some equipment elements, the entire system must remain workable.
- locality of data. In large distributed systems, data is usually distributed at a significant number of machines. However, as far as possible and in order to save resources, data is often processed on the same server as stored.

For stable operation of all three principles and, accordingly, the high storage efficiency and processing of large data requires new breakthrough technologies, such as, for example, blockchain.

Why do you need big data?

The scope of Big Data is constantly expanding:

- Big data can be used in medicine. Thus, it is possible to identify the diagnosis of the patient not only relying on the analysis of the history of the disease, but also taking into account the experience of other doctors, information about the ecological situation of the area of \u200b\u200bthe patient and many other factors.
- Big Data technologies can be used to organize the movement of unmanned vehicles.
- Processing large data arrays can be recognized by faces on photo and video materials.
- Big Data technologies can be used by retailers - trading companies can actively use data arrays from social networks to effectively configure their advertising campaigns that can be maximally oriented at a particular consumer segment.
- This technology is actively used in organizing election campaigns, including for analyzing political preferences in society.
- The use of Big Data technologies is relevant for the decisions of the income guarantee class (RA), which include inconsistencies detection tools and in-depth data analysis, allowing to identify the likely losses in a timely manner, or distorting information that can lead to a decrease in financial results.
- Telecommunication providers can aggregate large data, including geolocation; In turn, this information may be commercial interest for advertising agencies that can use it to show targeted and local advertising, as well as for retailers and banks.
- Big data can play an important role in solving the opening of a trading point in a certain location based on data on the presence of a powerful target flow of people.

Thus, the most obvious practical application of Big Data technology lies in marketing. Thanks to the development of the Internet and distribute all sorts of communication devices, behavioral data (such as the number of calls, buying habits and purchases) become available in real time.

Large data technology can also be effectively used in finance, for sociological studies and in many other areas. Experts argue that all these possibilities of using large data are only a visible part of the iceberg, since in much larger volumes these technologies are used in intelligence and counterintelligence, in a military business, as well as in all that is customary to call information wars.

In general, the sequence of working with Big Data consists of data collection, structuring received information using reports and dashboards, as well as the subsequent formulation of recommendations to action.

Consider briefly the use of Big Data technologies in marketing. As you know, for the marketer information is the main tool for predicting and drawing up a strategy. The analysis of large data has long and successfully applied to determine the target audience, interests, demand and activity of consumers. Analysis of large data, in particular, allows you to output an advertisement (based on the RTB-auction model - Real Time Bidding) only to consumers who are interested in the product or service.

The use of Big Data in marketing allows businessmen:

- It is better to recognize your consumers, attract a similar audience on the Internet;
- evaluate the degree of customer satisfaction;
- to understand whether the proposed service meets the expectations and needs;
- find and implement new ways that increase customer confidence;
- create projects using demand, etc.

For example, Google.trends service may indicate the marketer's forecast for seasonal demand for a particular product, fluctuations and geography of clicks. If you compare this information with statistical data collected by the corresponding plugin on your own website, you can create a plan for the distribution of the advertising budget, indicating the month, region and other parameters.

According to many researchers, it is in segmentation and the use of Big Data that the success of the PRAMPA campaign is concluded. The team of the future of the US President was able to correctly divide the audience, understand her desires and show exactly that messenger who voters want to see and hear. So, according to Irina Beljheva from Data-Centric Alliance, the victory of Trump became many times possible thanks to a non-standard approach to internet marketing, which was based on Big Data, psychological behavioral analysis and personalized advertising.

Political technologistics and tram marketers have used a specially developed mathematical model, which allowed us to deeply analyze the data of all US voters to systematize them by making ultra-flow targeting not only by geographical signs, but also by intentions, interests of voters, their psychotic, behavioral characteristics, etc. after This marketers organized personalized communication with each of the groups of citizens based on their needs, moods, political views, psychological features, and even skin colors using almost every individual voter from their mission.

As for Hillary Clinton, then in its campaign used "proven time" methods based on sociological data and standard marketing, dividing the electorate only to formally homogeneous groups (men, women, African Americans, Latin Americans, poor, rich, etc.) .

As a result, he won the one who appreciated the potential of new technologies and analysis methods. It is noteworthy that the costs of the election campaign Hillary Clinton were twice as much as her opponent:

Data: Pew Research

BIG PROBLEMS OF USE BIG DATA

In addition to high cost, one of the main factors inhibiting the introduction of Big Data in various spheres is the problem of selecting data from being processed: that is, the definitions of which data must be removed, stored and analyzed, and which are not taken into account.

Another problem Big Data is ethical. In other words, a natural question arises: can this data collection (especially without the user's knowledge) be considered a violation of private borders?

It is no secret that the information stored in Google and Yandex search engines allows IT giants to constantly improve their services, make them convenient for users and create new interactive applications. To do this, search engines collect user data on Internet activity on the Internet, IP addresses, geolocation data, interest and online shopping, personal data, postal reports, etc. All this allows you to demonstrate contextual advertising in accordance with the user's behavior on the Internet. In this case, usually the consent of users does not ask for this, and the possibility of choice which information about themselves is not given. That is, by default, Big Data is going to all that will then be stored on site data servers.

This implies the following important problem regarding the security of data storage and use. For example, is it safe to any analytical platform that consumers automatically transmit their data? In addition, many business representatives note a shortage of highly qualified analysts and marketers who can effectively operate in large amounts of data and solve specific business tasks with their help.

Despite all the difficulties with the introduction of Big Data, the business intends to increase the investment in this direction. According to Gartner's research, the leaders of investing in Big Data industries are media, retail, telecom, banking sector and service companies.

Prospects for the interaction of technology Blockchain and Big Data

Integration with Big Data carries a synergistic effect and opens up a wide range of new features, including allowing:

- gain access to detailed information on consumer preferences, on the basis of which you can build detailed analytical profiles for specific suppliers, goods and product components;
- integrate detailed data on transactions and statistics of consumption of certain groups of goods by various categories of users;
- receive detailed analytical data on supply and consumption chains, control the loss of products during transportation (for example, weight loss due to drying and evaporation of certain types of goods);
- counteract product falsifications, increase the effectiveness of combating money laundering and fraud, etc.

Access to the detailed data on the use and consumption of goods will largely reveal the potential of Big Data technology to optimize key business processes, reduce regulatory risks, will reveal new opportunities for monetization and product creation, which will maximize relevant consumer preferences.

As is known, representatives of the largest financial institutions, including, etc., in the opinion of Oliver Bussmann, IT-manager of the Swiss Financial Holding UBS, IT-manager of the Swiss Financial Holding, Blocked Technology is able to "reduce transaction processing time from several days to several minutes" .

The analysis potential from the blockchain with the help of Big Data technology is huge. The distributed registry technology ensures the integrity of the information, as well as the reliable and transparent storage of the entire history of the transaction. Big Data, in turn, provides new tools for effective analysis, forecasting, economic modeling and, accordingly, opens up new opportunities for making more suspended management decisions.

Tandem Blockchain and Big Data can be successfully used in health care. As you know, imperfect and incomplete data on the patient's health at times increase the risk of formulation of incorrect diagnosis and incorrectly prescribed treatment. Critically important data on the health of clients of medical institutions should be maximally protected, possess the properties of immutability, to be checked and should not be subject to any manipulations.

The information in the blockchalter meets all of the requirements listed and can serve as high-quality and reliable source data for deep analysis using new Big Data technologies. In addition, with the help of the blockchain, medical institutions would be able to exchange reliable data with insurance companies, justice bodies, employers, scientific institutions and other organizations in need of medical information.

Big Data and Information Security

In a wide sense, information security is the security of information and supporting infrastructure from random or deliberate negative impacts of natural or artificial nature.

In the field of information security Big Data faces the following challenges:

- Problems of data protection and ensure their integrity;
- the risk of outsiders and leakage of confidential information;
- improper storage of confidential information;
- the risk of losing information, for example, as a result of some kind of malicious actions;
- the risk of misuse of personal data by third parties, etc.

One of the main problems of large data that the blockchain is intended to solve is in the field of information security. By complying with all the basic principles, the distributed registry technology can guarantee the integrity and accuracy of the data, and thanks to the absence of a single point of refusal, the blockchain makes stable operation of information systems. The distributed registry technology can help solve the problem of confidence in the data, as well as provide the possibility of a universal exchange of them.

Information is a valuable asset, which means that in the foreground should be the issue of ensuring the main aspects of information security. In order to stand in a competitive struggle, companies should keep up with the times, which means that they cannot be ignored by the potential opportunities and benefits that the blockchain technology and Big Data tools enter into themselves.

It was predicted that the overall global volume of created and replicated data in 2011 may amount to about 1.8 satetta (1.8 trillion gigabyte) - about 9 times more than what was created in 2006.

More complex definition

Nevertheless ` big data`Invite more than just an analysis of huge amounts of information. The problem is not that organizations create huge amounts of data, and in the fact that most of them are presented in a format, poorly relevant to the traditional structured database format - these are web logs, video recordings, text documents, machine code, or, for example, geospatial data . All this is stored in many diverse repositories, sometimes even outside the organization. As a result, the corporation may have access to a huge amount of their data and not have the necessary tools to establish relationships between these data and make significant conclusions based on them. Add here the fact that the data is now updated increasingly and more often, and you will get a situation in which traditional methods for analyzing information cannot affect the huge amounts of constantly updated data, which in the end and opens the road technologies large data.

Best definition

In essence, the concept large data It implies work with the information of a huge volume and a variety of composition, very often updated and located in different sources in order to increase the efficiency of work, creating new products and increasing competitiveness. Consulting company Forrester gives a brief wording: ` Big data Combine techniques and technologies that remove the meaning of the data on the extreme limit of practicality`.

How big is the difference between business analytics and big data?

Craig Bati, executive director of marketing and director of Fujitsu Australia technologies, indicated that business analysis is a descriptive process for analyzing the results achieved by the business at a certain period of time, meanwhile as the processing speed large data Allows you to make an analysis predictive, able to offer business recommendations for the future. Large data technology also allow you to analyze more data types in comparison with business analytics tools, which makes it possible to focus not only on structured storage facilities.

Matt Slocum from O "Reilly Radar believes that though big data And business analysts have the same goal (search for answers to the question), they differ from each other in three aspects.

  • Big data are designed to handle more significant amounts of information than business analyst, and this, of course, corresponds to the traditional definition of large data.
  • Big data are intended for processing more quickly obtained and changing information, which means deep research and interactivity. In some cases, the results are formed faster than the web page is loaded.
  • Big data are intended for processing unstructured data, whose use methods we only begin to study after able to establish their collection and storage, and we require algorithms and the possibility of dialogue to facilitate the search for trends contained within these arrays.

According to an Oracle-published white book `Oracle Information Architecture: Architect's Guide for Big Data. (Oracle Information Architecture: An Architect" S Guide to Big Data), when working with big data, we approach information otherwise than when conducting business analysis.

Working with large data is not similar to the usual business intelligence process, where a simple addition of known values \u200b\u200bbrings the result: for example, the result of the addition of paid accounts is becoming a sales volume for the year. When working with large data, the result is obtained in the process of cleaning them by successive modeling: first the hypothesis is put forward, a statistical, visual or semantic model is built, the loyalty to the hypotheses extended its base is checked and then the next one is put forward. This process requires a researcher or interpretation of visual values \u200b\u200bor compiling interactive knowledge-based queries, or the development of adaptive algorithms `Machine training, capable of obtaining the desired result. And the lifetime of such an algorithm can be rather short.

Methods for analyzing big data

There are many diverse methods for analyzing data arrays, which are based on tools borrowed from statistics and computer science (for example, machine learning). The list does not pretend to be complete, but it reflects the approaches most demanded in various industries. At the same time, it should be understood that researchers continue to work on creating new techniques and improving existing ones. In addition, some of these methods are not necessarily applied exclusively to large data and can be successfully used for smaller arrays (for example, A / B testing, regression analysis). Of course, the more volumetric and diversified array is analyzed, the more accurate and relevant data can be obtained at the output.

A / B Testing. The technique in which the control sample is alternately compared with others. Thus, it is possible to identify the optimal combination of indicators to achieve, for example, the best response of consumers for a marketing offer. Big data allow you to spend a huge number of iterations and thus get a statistically reliable result.

Association Rule Learning. A set of techniques to identify relationships, i.e. Associative rules between variables in large data arrays. Used B. data Mining..

Classification. A set of techniques that allows you to predict the behavior of consumers in a specific market segment (making decisions on the purchase, outflow, consumption volume, etc.). Used B. data Mining..

Cluster Analysis.. The statistical method of classifying objects by groups by detection of non-known common features. Used B. data Mining..

Crowdsourcing.. Methods for collecting data from a large number of sources.

Data Fusion and Data Integration. A set of techniques that allows you to analyze the comments of users of social networks and compare with the results of real-time sales.

Data Mining.. A set of techniques that allows you to determine the most susceptible to the progressable product or service category of consumers, identify the features of the most successful employees, predict a behavioral model of consumers.

Ensemble Learning. In this method, many predicative models are involved at the expense of which the quality of predictions made.

Genetic algorithms.. In this technique, possible solutions are presented in the form `chromosome`, which can be combined and mutually. As in the process of natural evolution, the most adapted individual survives.

Machine Learning. Direction in computer science (Historically, the name of the `artificial intelligence) was entrenched historically, which pursues the goal of creating self-study algorithms based on an analysis of empirical data.

Natural Language Processing. (NLP.). Set of borrowed from computer science and linguistics techniques for recognizing the natural language of a person.

Network Analysis. A set of methods for analyzing links between nodes in networks. With regard to social networks allows you to analyze the relationship between individual users, companies, communities, etc.

Optimization. A set of numerical methods for redesign complex systems and processes to improve one or more indicators. It helps in making strategic decisions, for example, the composition of the product line launched to the market, investment analysis, and so on.

Pattern Recognition. A set of techniques with self-learning elements for predicting a consumer behavioral model.

Predictive Modeling. A set of techniques that make it possible to create a mathematical model in advance of the specified probable event development scenario. For example, an analysis of the CRM database-cues for possible conditions that subscribers will be prompted to change the provider.

Regression. A set of statistical methods to identify patterns between a change in the dependent variable and one or more independent. It is often used for forecasting and predictions. Used in Data Mining.

Sentiment Analysis. The methods of assessing consumer sentiment are based on the human language recognition techniques. They allow you to identify from the general information flow of the message associated with the subject of interest (for example, by the consumer product). Next to estimate the polarity of judgment (positive or negative), the degree of emotionality, and so on.

Signal Processing.. Borrowed from radio engineering A set of techniques that pursues the target recognition target against the background of noise and its further analysis.

Spatial Analysis. The set of partly borrowed from statistics methods for analyzing spatial data - topology of the terrain, geographical coordinates, geometry of objects. Source large data In this case, geographic information systems often perform (GIS).

  • Revolution Analytics (based on R language for mat. Straightencies).

Of particular interest in this list represents Apache Hadoop - by open source, which over the past five years has been tested as the analyzer of the data by most trackers shares. As soon as Yahoo opened the Hadoop code with an open source community, a whole direction for creating products based on Hadoop appeared in the IT industry. Almost all modern means of analysis large data Provide integration tools with Hadoop. Their developers act as startups and well-known world companies.

Markets solutions to manage large data

Large Data Platforms (BDP, Big Data Platform) as a means of combating digital chores

Ability to analyze big data, In the surprise called Big Data, perceived as a good, and definitely. But is it really? What can the rampant data accumulation? Most likely to the fact that domestic psychologists in relation to man are called pathological storage, silhloromicia or figuratively "Plushkin Syndrome". In English, a vicious passion to collect everything in a row is called Hording (from the English Hoard - "stock"). According to the classification of mental diseases, choroding is counted for mental disorders. Digital Hoarding (Digital Hoarding) is added to the digital era, they may suffer both individual identities and whole enterprises and organizations ().

World and Russian market

Big Data Landscape - Main suppliers

Interest in Tools for Collection, Processing, Management and Analysis large data Exposed all the leading IT companies, which is quite natural. First, they directly face this phenomenon in their own business, secondly, big data Open excellent opportunities for the development of new niches of the market and attract new customers.

A lot of startups appeared on the market that makes a business on the processing of huge data arrays. Some of them use the finished cloud infrastructure provided by major players like Amazon.

Theory and practice of large data in industries

History of development

2017

TmaxSoft forecast: the next "wave" Big Data will require the upgrade of the DBMS

Enterprises are known that in their huge amounts of data, there is important information about their business and clients. If the company can successfully apply this information, then it will have a significant advantage compared to competitors, and it will be able to offer the best than them, products and services. However, many organizations still cannot effectively use big data Due to the fact that their inherited IT infrastructure is unable to provide the necessary capacity of storage systems, data exchange processes, utilities and applications necessary for processing and analyzing large arrays of unstructured data to extract valuable information from them indicated in TmaxSoft.

In addition, an increase in the processor capacity necessary for the analysis of constantly increasing data volumes may require significant investments in the outdated IT infrastructure of the organization, as well as additional support resources that could be used to develop new applications and services.

On February 5, 2015, the White House published a report in which the question was discussed how big data»To establish various prices for different buyers - practice known as" price discrimination "or" Differentiated pricing "(Personalized Pricing). The report describes the benefit of "big data" both for sellers and buyers, and its authors come to the conclusion that many problematic issues arising from the advent of large data and differentiated pricing can be solved within the framework of existing anti-discrimination legislation and laws protecting consumer rights.

The report notes that at this time there are only individual facts indicating how companies use large data in the context of individualized marketing and differentiated pricing. This information show that sellers use pricing methods that can be divided into three categories:

  • study of the demand curve;
  • Guidance (STEERING) and differentiated pricing based on demographic data; and
  • target behavioral marketing (behavioral targeting - Behavioral Targeting) and individualized pricing.

Studying demand curve: In order to clarify the demand and study of the behavior of consumers, marketers often conduct experiments in this area, during which one of the two possible price categories is randomly appointed by clients. "Technically, these experiments are the form of differentiated pricing, since their consequence becomes different prices for customers, even if they are" non-discriminatory "in the sense that all customers have the likelihood of" getting "at a higher price of the same."

STEERING): This is the practice of presenting products to consumers based on their belonging to a specific demographic group. Thus, the website of the computer company can offer one and the same laptop to various types of customers at different prices, settled on the basis of information reported by them (for example, depending on whether this user is a representative of state bodies, scientific or commercial institutions, or a private person) or from their geographic location (for example, defined by the IP address of the computer).

Target behavioral marketing and individualized pricing: In these cases, personal data of buyers are used for targeted advertising and customized prices for certain products. For example, online advertisers use collected by advertising networks and through the cookies of third parties data about the activity of users on the Internet in order to aim to send their promotional materials. Such an approach, on the one hand, makes it possible to consumers to receive advertising of goods and services for them, it, however, may cause the concerns of those consumers who do not want certain types of their personal data (such as information about visiting sites related With medical and financial issues) were collected without their consent.

Although targeted behavioral marketing is widespread, there is relatively little evidence of individualized pricing in the online environment. The report suggests that this may be due to the fact that the corresponding methods are still being developed, or with the fact that companies are in no hurry to use individual pricing (or prefer to praise about it) - perhaps, fearing the negative reaction from consumers.

The authors of the report believe that "for an individual consumer, the use of large data is undoubtedly due to both potential return and risks." Recognizing that when using large data, problems of transparency and discrimination appear, the report at the same time claims that existing anti-discrimination laws and consumer protection laws are sufficient to solve them. However, the report also emphasizes the need for "continuous control" in cases where companies use confidential information by an opaque manner or methods that are not covered by an existing regulatory framework.

This report is a continuation of the efforts of the White House to study the use of "big data" and discriminatory pricing on the Internet, and relevant consequences for American consumers. Earlier it has already been reported that the Working Group of the White House for Great Data has published its report on this issue in May 2014. The Federal Commission for Trade (FTC) also considered these issues during the seminar on discrimination in the September 2014 in September 2014 due to the use of large data.

2014

Gartner dispels myths about "big data"

In the analytical note of the fall of 2014, Gartner listed a number of the myths common among IT managers regarding large data and their refutation is given.

  • Everyone is implementing big data processing systems faster than us

Interest in larger data technologies is recorded: 73% of organizations surveyed by Gartner analysts this year already invest in relevant projects or are collected. But most of these initiatives are still at the earliest stages, and only 13% of respondents have already implemented such solutions. The most difficult thing is to determine how to extract income from large data, decide where to start. Many organizations are stuck in the pilot stage, because they cannot bind a new technology to specific business processes.

  • We have so much data that there is no need to worry about small mistakes in them

Some IT managers believe that small flaws in the data do not affect the overall results of the analysis of huge volumes. When there is a lot of data, each error separately does less affect the result, analysts mark, but also becomes more. In addition, most of the analyzed data is an external, unknown structure or origin, so the probability of errors is growing. Thus, in the world of large data, the quality is actually much more important.

  • Large data technology will cancel the need to integrate data

Big data promise the ability to process data in an original format with automatic formation of the circuit as it is read. It is believed that this will allow analyze information from the same sources using multiple data models. Many believe that it will also allow end users to interpret any set of data at its discretion. In reality, most users often need a traditional way with the finished scheme when the data is formatted accordingly, and there are agreements on the integrity level and how it should relate to the use script.

  • Data warehouse does not make sense to use for complex analytics

Many administrators of information management systems believe that it makes no sense to spend time creating a data warehouse, taking into account that complex analytical systems use new data types. In fact, in many systems of complex analytics uses information from the data storage. In other cases, new types of data must be additionally prepared to analyze in large data processing systems; It is necessary to make decisions on the suitability of the data, the principles of aggregation and the necessary level of quality - such preparation can occur outside the repository.

  • Data lakes will come to change data storage facilities

In reality, the suppliers are misleading customers, positioning the data lakes (Data Lake) as replacing storage facilities or as critical elements of analytical infrastructure. The fundamental technologies of the lakes of the data lack the maturity and the latitude of the functionality inherent in storage facilities. Therefore, the leaders responsible for managing data should wait until the lakes reach the same level of development, believe in Gartner.

Accenture: 92% of the implementing system of large data are satisfied with the result

Among the main advantages of large data, the respondents called:

  • "Search for new sources of income" (56%),
  • "Improving customer experience" (51%),
  • "New products and services" (50%) and
  • "The influx of new customers and the preservation of older loyalty" (47%).

In the introduction of new technologies, many companies faced traditional problems. For a 51% stumbling block, security was, for 47% - budget, for 41% - lack of necessary frames, and for 35% complexity when integrating with an existing system. Almost all the companies surveyed (about 91%) are planning to solve the problem with the lack of personnel and hire specialists from great data.

Companies optimistic assesses the future technologies of large data. 89% believe that they will change the business as much as the Internet. 79% of respondents noted that companies that do not enjoy large data will lose a competitive advantage.

However, the respondents dealt with the opinion that it was worth it to be large. 65% of respondents believe that these are "big data cards", 60% are confident that it is "advanced analytics and analysis", and 50% is that these are "data of visualization tools".

Madrid spends 14.7 million euros on the management of big data

In July 2014, it became known that Madrid would use Big Data technology to manage urban infrastructure. The cost of the project is 14.7 million euros, the basis of the implementable solutions will be technologies for analysis and management of large data. With their help, the urban administration will manage work with each service provider and to pay it accordingly depending on the level of services.

We are talking about the contractors of the administration, which are followed by the state of streets, lighting, irrigation, green plantings, carry out the cleaning of the territory and export, as well as the processing of garbage. During the project, 300 key indicators of the performance of urban services were developed for specially selected inspectors, on the basis of which 1,5 thousand different checks and measurements will be carried out daily. In addition, the city will begin using an innovative technological platform called Madrid Inteligente (MINT) - Smart Madrid.

2013

Experts: Fashion Peak on Big Data

Everyone without exception to the vendor in the data management market at this time, technologies are developing technologies for the Big Data Management. This new technological trend is also actively discussed by the professional community, both developers and sectoral analysts and potential consumers of such solutions.

As DataShift found out, as of January 2013, a wave of discussions around " large data"Exceeded all imaginable sizes. After analyzing the number of Mentions of Big Data on social networks, DataShift was calculated that in 2012 this term was used about 2 billion times in posts created about 1 million different authors around the world. This is equivalent to 260 posts per hour, with the peak of references amounted to 3070 references per hour.

Gartner: Each second IT director is ready to spend money on Big Data

After several years of experiments with Big Data technologies and first implementations in 2013, adaptation of such decisions will increase significantly, predicted in Gartner. Researchers interviewed IT leaders around the world and found that 42% of respondents have already invested in Big Data technology or plan to make such investments over the next year (data for March 2013).

Companies are forced to spend money on processing technology. large dataSince the information landscape is rapidly changing, demanding new approaches to information processing. Many companies have already realized that large data arrays are critical, and work with them allows you to achieve benefits that are not available when using traditional sources of information and processing methods. In addition, permanent duty to the topic of "big data" in the media is heating interest in the relevant technologies.

Frank Buytendijk, Vice President Gartner, even called on the company to temper the dust, as some are concerned that they are lagging behind competitors in the development of Big Data.

"It's not necessary to worry about the opportunity to implement ideas on the basis of" big data "technologies are actually endless," he said.

According to Gartner, by 2015, 20% of the Global 1000 list companies will take a strategic focus on the "infrastructure".

In anticipation of new features that will bring with them the technology of processing "big data", now many organizations organize the process of collecting and storing various kinds of information.

For educational and government organizations, as well as industry companies, the largest potential for business transformation is laid in combination of accumulated data with the so-called Dark Data (literal "data"), the latest information includes email messages, multimedia and other similar content. According to Gartner, it is those who will learn how to handle the data from the most different sources of information.

Cisco Survey: Big Data will help increase IT budgets

During the study (spring 2013), called Cisco Connected World Technology Report, conducted in 18 countries by an independent Analytical company InsightExpress, 1,800 college students were interviewed and the same number of young professionals aged 18 to 30 years. The survey was conducted to find out the level of readiness of IT departments to project implementation Big Data. and get an idea of \u200b\u200brelated issues, technological flaws and strategic value of such projects.

Most companies collect, writes and analyzes the data. Nevertheless, the report says, many companies in connection with Big Data are faced with a number of complex business and information technology problems. For example, 60 percent of respondents recognize that Big Data solutions can improve decision-making processes and increase competitiveness, but only 28 percent declared that the real strategic advantages of the accumulated information are already obtained.

More than half of the IT managers respondents believe that Big Data projects will help to increase IT budgets in their organizations, as there will be increased requirements for technologies, personnel and professional skills. At the same time, more than half of the respondents expect such projects to increase IT budgets in their companies already in 2012. 57 percent are confident that Big Data will increase their budgets over the next three years.

81 The percent of respondents said that all (or at least some) Big Data projects will require cloud computing. Thus, the distribution of cloud technologies may affect the speed of distribution of Big Data solutions and on the values \u200b\u200bof these business solutions.

Companies collect and use data of a variety of types such as structured and unstructured. Here is from what sources the data of the survey participants receive (Cisco Connected World Technology Report):

Almost half (48 percent) of IT managers predicts the doubling of the load on their network over the next two years. (This is especially characteristic of China, where 68 percent of respondents and Germany are adhered to a point of view - 60 percent). 23 percent of respondents await the tripling of the network load over the next two years. At the same time, only 40 percent of respondents declared their readiness to the explosive increase in network traffic.

27 percent of respondents recognized that they need better IT policies and information security measures.

21 percent needs to expand bandwidth.

Big Data opens up new opportunities before IT departments to increase the value and form a close relationship with business units, allowing to increase income and strengthen the company's financial position. Big Data projects make IT divisions with a strategic partner of business units.

According to 73 percent of respondents, it is the IT department that will become the main locomotive of the implementation of the Big Data strategy. At the same time, they consider respondents, other departments will also connect to the implementation of this strategy. First of all, it concerns the departments of Finance (24 percent of respondents), research (20 percent), operational (20 percent), engineering (19 percent), as well as marketing departments (15 percent) and sales (14 percent).

Gartner: To manage big data, millions of new jobs are needed.

World IT costs have been reached $ 3.7 billion by 2013, which is 3.8% more expenses for information technology in 2012 (the forecast at the end of the year is $ 3.6 billion). Segment large data (Big Data) will develop much higher rates, the Gartner report says.

By 2015, 4.4 million jobs in the field of information technologies will be created to serve large data, of which 1.9 million jobs are in. Moreover, each such workplace will entail the creation of three additional jobs outside the IT sphere, so that only in the United States in the next four years a person will work to maintain the information economy.

According to Gartner experts, the main problem is that in the industry for this not enough talents: both private and state educational system, for example, in the United States are not able to supply the industry with a sufficient number of qualified personnel. So from the mentioned new jobs in IT frames will be ensured only one of three.

Analysts believe that the role of painting qualified IT personnel should take on themselves directly to the Company, which are in dire need of them, as such employees will pass for them to the new information economy of the future.

2012

The first skepticism for "big data"

Analysts of Ovum and Gartner companies suggest that for fashionable in 2012 themes large data may come the time of liberation from illusions.

The term "big data", at this time, as a rule, denote the ever-growing amount of information coming in operational mode from social media, from networks of sensors and other sources, as well as a growing range of tools used for data processing and identifying important business based on them. - Tencies.

"Because of the hype (or despite it) regarding the idea of \u200b\u200blarge data, manufacturers in 2012 with a great hope looked at this tendency," said Tony Bayer, Ovum analyst.

Bayer said that DatasiFT conducted a retrospective analysis of the mention of large data in