Cloudera: “The Greatest Challenge is not in Grabbing the Data and Storing it, but in Understanding and Managing it”

We interviewed Juan Carlos Sánchez de la Fuente, Cloudera’s regional vice president for Spain and Portugal, who reviews the current situation in data management, its commitment to open source and the revolution that the industry is experiencing with generative AI.

Cloudera, a software company founded in 2008, has become a benchmark in the field of corporate data management. The company was created by ex-employees of Google, Yahoo, Facebook and Oracle with the mission of developing an enterprise distribution of the Apache Hadoop framework, an open-source software system for storing and processing large data sets.

In other words, its beginnings have always been marked by the development of open-source-based solutions. Since then it has been diversifying its offering and adding new capabilities organically and through acquisitions and mergers to model what is currently its flagship, Cloudera Data Platform, a platform that combines on-premise and cloud capabilities, facilitating hybrid and multi-cloud data management in organisations.

Juan Carlos Sánchez de la Fuente, Cloudera’s regional vice president for Spain and Portugal, spoke to us about all this and more. The executive has a long career in the technology sector working in companies such as Oracle, Tata Consultancy Services, ServiceNow and, for almost 5 years, being the head of Cloudera’s subsidiary in the Iberian region.

Here is the interview with Sánchez de la Fuente:

– To get into the subject, how is Cloudera’s business in Spain going this year?

Cloudera closes its fiscal year on 31 January, so the first half of the year will close on 31 July. As you know, we don’t offer specific business figures, but I can tell you that this year has been positive. H1 has been a good six months and we are in line with our objectives. In addition, we have gained new clients and we are adding more people to our staff.

In addition, the last quarter of the year is usually very important because projects from previous months are being finalised.

All this always with caution because we are seeing initiatives in the market, but also because many companies have returned to a dynamic of management control of operating expenses (OPEX), although there is money to invest in CAPEX.

Private companies, within that caution, have money to invest in new initiatives, especially those related to artificial intelligence. There are still few projects in production and it is expected that this area will gradually develop more and more.

In public administration, a very important sector for us, we see movement, albeit slow because there are changes in government teams that slow down projects that are 6 or 7 months old and whose tenders are still pending award.

These budgets are assured, so for us, the year is an exciting one.

– What impact are the EU’s NextGen funds having? There is a lot of money on paper, but implementation is slowing down a lot. What is your view on this?

It is true that these funds created additional budgets, but in these times of political changes in the European Union there may be some adjustments and decisions that remain to be seen. In any case, companies and public organisations will move forward with projects and ideas that they have been developing for a long time thanks to these funds.
In my view, the important thing is that these organisations can get the real value out of their data, but without rushing to execute these funds, so that things are not done and then left in a drawer.

– …This is something that has even more impact if we talk about the data that public organisations handle, where the initial designs and architectures are fundamental so that later all this information can be processed correctly and help citizens more efficiently…

I totally agree. There are people who are very capable of making the right decisions, but they must be given space, time, autonomy and capacity to be able to decide. There are funds, but public administrations have to put them in the right places.

At Cloudera, it is our responsibility to accompany and advise them throughout this cycle and provide them with a differential value to make these projects a success. To do this, we have to offer them flexibility.

– The private sector is also very important for you, in which areas are you focusing more?

Historically, the areas in which Cloudera has most presence are Banking and Insurance. Here we work with the main organisations in this transformation that has been taking place over the last few years.

But there are other areas where we are tackling very important projects, such as Industry 4.0 and everything to do with supply chains and automotive; Consumer companies and the entire life cycle of manufacturing processes and the Retail sector, which has always been a very advanced sector in terms of data analytics from the point of view of the end consumer.

Juan Carlos Sánchez de la Fuente, at Cloudera’s offices in Paseo de la Castellana, where we conducted the interview.

In this sector, all kinds of artificial intelligence and advanced analytics projects are increasingly being implemented to increase sales, both at supermarket level and in processing and plant management.

Finally, I would like to point out sectors such as Energy or Utilities, where Spain has leading companies at European level and with a very consolidated global position. These are innovative and increasingly data-driven companies that make decisions based on their data but still have a long way to go in this field.

– What are the main challenges that all these organisations are currently facing?

They are the challenges that we have been facing historically, mainly the capacity to manage all the information they handle. It is an increasingly complex problem due to the explosion of data that we are experiencing during this decade.

In the past, a lot of data was deprecated, but today, thanks to artificial intelligence, it is possible to process it to extract great value. IoT devices, 5G networks and a long list of other sources are generating really valuable information, but we have to learn how to process and structure it. And this is the main problem for organisations.

In a study we conducted some time ago, we found that organisations are only using a third of their data because they are not able to manage all that information.
The big challenge is not to take the data and store it, but to take it, understand it and manage it. We are talking about structured and unstructured data, such as videos. There is a lot of valuable information in video that is already possible to analyse and extract, something that, in the past, with the technology at the time, was impossible to achieve.

– This is where Cloudera’s technology comes into play. How do you respond to these challenges?

First of all, I can tell you about one of our cornerstones, the development of an open and truly hybrid platform. What does it mean for a platform to be truly hybrid? It means that customers don’t have to worry about where to host the data, how to host it, what infrastructure to use… they can focus on processing that data and achieving the highest quality. Cloudera helps them to create this data ecosystem independently of the way the client decides to have it, whether in a private environment, in the cloud or in multicloud.

We provide the tools to interconnect all sources and build data interoperability. Many organisations have siloed systems that have been created that are not able to talk to each other, something our platform achieves.

What’s more, our customers don’t have to work solely with Cloudera – far from it. Being an open platform, we also connect with other data platforms to achieve a unified repository, so that customers can have full control and visibility of all information from a single dashboard.

A second point I would like to stress is that Cloudera is a true hybrid platform, which means that we do not perform any transformation, replication, copying or linking of data unless it is strictly necessary. Our platform works with the data at its source independently of where it is stored.

In this sense, I like to talk about the Open Data Lakehouse, which is something like a fusion of all the best of the architectures that have been appearing over the years to manage information: Data Warehouse, Data Lake and Data Mesh.

Through Apache Iceberg, based on open source, our platform is able to extract the benefits of these architectures without the need to dump data in order to apply advanced analytics on all data repositories.

With the passage of time, everything indicates that the protocol used by Apache Iceberg is the correct one and the one that is being adopted by the industry, with Cloudera being one of the main contributors to this Open Source project.

– What is Cloudera’s vision of artificial intelligence? You have been implementing it in your platform for years, but now we are facing the boom of generative AI. What can you tell us about this great revolution?

Yes, we have been working on AI-based initiatives for a long time, but a year and a half ago there was an explosion around the concept of generative AI that has meant a paradigm shift when it comes to tackling certain projects.

The advances that have taken place are abysmal and we have adopted, once again, Open Source to provide our clients with existing generative AI capabilities, but with the important premise of working in environments where corporate information is not shared, as is the case with the popular ChatGPT.

For us, data is an organisation’s most precious asset, so we are bringing generative AI to the data, not the other way around. We provide the tools so that corporate data doesn’t have to be moved or shared with anyone. Organisations that want to make use of these technologies can incorporate the big language models (LLMs) already created and those open source-based tools into our platform so that they are combined with the information without a single bit leaving the corporate environment.

– Another challenge facing organisations is talent acquisition and management. How does Cloudera tackle this problem?

Indeed, it is a very important challenge. Cloudera advocates open source software, something that opens up a range of possibilities if we consider the number of skilled professionals that cannot be found when developing proprietary technologies that require specific training.

For us, betting on Open Source from the beginning is the only way to find professionals and resources. Those who are trained in free software know that they will be able to apply it to any field and in practically any company.

Even so, talent management is complicated. We have to focus a lot on continuous training and that is what we pass on to our clients, helping them to achieve this continuity. We have a training plan and we are also working with governmental organisations such as the SEPE or the Community of Madrid to help train talent. In the private sector, we have specific partners in Spain with a high training capacity with whom we work to develop these training plans.