Outdated data centre infrastructure is preventing the full use of AI applications, says Sven Breuner from VAST.
Generative AI and Large Language Models (LLMs) have already shown their potential. However, LLMs are limited to routine tasks such as business reports or the recitation of already known information. The true promise of AI will be fulfilled when machines can replicate the process of discovery by capturing, synthesising and learning from data. AI would reach a level of specialisation in a matter of days that used to take decades.
Companies that want to benefit from the potential of AI need a data platform that simplifies data management and processing in a standardised stack. The next generation of AI infrastructure must provide parallel file access, GPU-optimised performance for training neural networks and inference on unstructured data, and a global namespace that covers hybrid multi-cloud and edge environments.
Infrastructure compromises can be overcome
Technical compromises have so far prevented AI applications from processing and understanding data collections from global infrastructures in real time. A modern data platform must cover the entire data spectrum of natural data – unstructured and structured data types in the form of videos, images, free text, data streams and instrument data. The aim is to process data from all over the world in real time with a global data corpus. This makes it possible to close the gap between event-driven and data-driven architectures and enable the following:
- Access and process data in any private or large public cloud data centre.
- Understand natural data by embedding a queryable semantic layer into the data itself.
- Continuous and recursive computation of data in real time that evolves with every interaction.
A modern distributed systems architecture based on the Disaggregated Shared-Everything (DASE) approach creates the data foundation for deep learning by eliminating trade-offs in performance, capacity, scale, simplicity and resilience. This makes it possible to train models on all of a company’s data. When complemented with logic, machines can continuously and recursively enrich and understand data from the natural world.
Global data storage, database and AI computing engine
Capturing and delivering data from the natural world requires a scalable storage architecture for unstructured data that does not require storage tiering. An enterprise NAS platform with file and object storage interfaces will meet the needs of today’s powerful AI computing architectures, big data and HPC platforms.
For a long time, there have been indications that flash technology, which is much more efficient than older storage technologies in terms of energy and space requirements, will be indispensable for AI. For a long time, however, flash was too cost-intensive to be used on a large scale. This has now fundamentally changed. In view of rising energy costs and sustainability requirements, flash can now also demonstrate its economic efficiency. This was crucial in creating the basis for deep learning for companies that want to train models on their own data sets.
Structuring unstructured natural data
With a semantic database layer that is natively integrated into the data platform, unstructured natural data can be structured. By simplifying structured data, it is possible to resolve the trade-offs between transactions, for capturing and cataloguing natural data in real time, and analyses, for correlating data in real time. This combines the properties of a database, a data warehouse and a data lake in a simple, distributed and standardised management system. An AI-capable database of this type is designed for fast data acquisition and fast queries on any scale. This makes it possible to break through the boundaries of real-time analysis from the event stream to the archive.
As a further element of an AI data platform, a global namespace will make it possible to store, retrieve and process data from any location with high performance. This should be done while maintaining strict consistency across every access point. As a result, such a data platform could be used in local data centres, edge environments and in combination with the large public cloud platforms that dominate the market. The transition from AI hype and the AI boom to a tangible AI revolution is thus moving a decisive step closer.
Field CTO International VAST Data.