Within the vast world of big data, it is worth stopping at two types of data warehouses. They have different utilities, architectures, and benefits for the companies or institutions that use them. We talk about data lakes and data warehouses.
What is a ‘Data Lake’?
A data lake is a repository in which data from the same sector is stored. Still, it may or may not be structured and is not ordered in any particular way. For example, it is a reasonably standard storage system in the research world.
Therefore, we must understand that the main characteristic of this type of data warehouse is that it is stored raw and not processed. In short, it must be structured to draw valid conclusions from it.
‘Data warehouse’: When data is put into order
Warehouse means “depot” or “warehouse”, which suggests a particular organization not glimpsed in the lake concept. And it is like that. The main difference between a data lake and a data warehouse (data lake data warehouse) is that the latter offers those looking for structured and segmented data based on particular needs.
What are they for?
A company or institution can interchangeably use repositories, such as lakes and warehouses, to extract relevant information. However, we must consider their differences, advantages, and disadvantages.
- The data lake preserves all the stored data, whether or not it is helpful to the user.
- The data warehouse is a data model that stores what has been requested and how it has been requested.
Who uses data lakes and data warehouses?
From a company that wants to know its impact on social networks to a network of research centres that need a massive amount of data to health institutions that are looking for ways to improve their service, all of them and many more organizations use data lakes and warehouses to draw conclusions that move them forward.
Advantages and disadvantages
Data lakes lack structure and can offer erroneous results if not organized or managed correctly. Conveniently, whoever uses them to extract information knows how to limit it. On the positive side, however, they are very moldable and flexible data sets. They do not eliminate absolutely any type of information (which perhaps we do not need now, but in the future, we will), and they adapt to new circumstances.
On the contrary, data warehouses organize information according to the user and needs, always in the same way, and discard everything that is not worth storing (based on cost-benefit). The information they offer is less fast than in the previous case but undoubtedly much more precise. However, suppose a change needs to be made. In that case, a data warehouse adapts poorly to changes: it is inflexible and not applicable when the user wants quick answers.