Artificial intelligence has been declared as the new industrial revolution. At the moment when its deployment has been predicted in all sectors of society, the data by which it is sustained has become a crucial issue. Without numerous reliable, ethical, analysable and secure data, it’s impossible to envisage the future of Artificial Intelligence. So, the data war has indeed been declared!
A first vital step: data records set up
Today, we produce more data than ever before in our history. Each second, nearly 40,000 Google searches are carried out and 2 million emails are sent. In 2019, the volume of global web traffic will exceed 2 zettabytes per year. Yet despite the advances of cognitive IT and the power of processing, the exploitation of all this data remains largely inaccessible to companies.
Also, one of the first issues to raise for the development of the IA is the collection and annotation of large data packages. Today, such data records are still rare resources. Insufficient volumes of real existing data, limited collection capacities, difficulty to transmit raw data to useable data, …the reasons of this rarity are often different according to the sectors of activity.
In France, several initiatives offer to resolve this problem. For example, in the framework of the deployment of the French strategy in IA matters, the state has just issued a call for the expression of interest, open until November, 2018, in order to support sectorial or cross-sectorial initiatives of mutualisation of data for the purpose of IA solution development.
The difficulty of the transfer of raw data to useable data
Artificial intelligence is, first and foremost, linked to the quality of data used by applications. Despite the development of tools and Data Warehouse (such as Hadoop and Spark), data analysis is still complex, whether structured or not.
Companies must find out how to transform available data to useable insights. Indeed, to transfer data to artificial intelligence systems, it’s necessary to clean it, look for format errors, track duplications within databases and transform raw data to useable data.
The great complexity of I.S. which is composed of several applications and heterogeneous databases does not simply the task. This is why company encounter even greater difficulties to transfer their data from operational to analytical systems.
An example is data collection within information systems of public and private health establishments. This is one sector where artificial intelligence promises great advances but where the variety of the data formats imposes several operations before the AI can fulfil its promises. Data extraction, conversion and restitution to the desired format, constitutes in this sector a challenge to raise before artificial intelligence becomes truly allied with health professionals.
Security priorities to protect used data
The accumulation of data on the Cloud servers and its accessibility to fraudsters can prove to be fatal for companies and their Artificial Intelligence devices. This is why the identification of the process that encourages manipulation is a key step to improve data reliability.
To illustrate, let’s refer to the challenges that actors in the insurance world are faced with.
According to an Accenture study, conducted on nearly 600 insurers and sector experts in 25 countries including France, one third of insurers interviewed have been subjected to practices such as bot fraud or the falsification of sensor or localisation data, whereas another third think about being subjected to such an attack but without being able to verify it.
At the moment when data constitutes the crux for the development of Artificial Intelligence, it is all the more crucial to be able to protect in an efficient manner.
The necessity to use authentic and ethical data
Many concerns have also emerged regarding the authenticity and the ethics of Artificial Intelligence and Big Data. How is data protected to guarantee its authenticity? What is the data that systems can use without being unethical?
We can cite several examples, such as the white rectangles glued on a ‘’stop’’ sign that can drive a model of Deep learning to see only a speed limitation in the place of a stop sign. In addition are Artificial Intelligence systems used by some American tribunals to predict the risk of recurrence and that use information based on the skin colour of the defendants. In other words, an artificial intelligence becomes racist due to data integrated in its model.
Even today, to determine what data must be identified and used, the human management and Machine Teaching, still remain vital. This is an obligatory step to suggest in the learning of usable data algorithms with regards to ethical and legal lines.