Building a Cost-Effective Data Lake for BI

Webinar Title

Event Type

Webinar

Webinar Date

26-10-2022

Last Date for Applying

26-10-2022

Location

Online Event

Organization Name / Organize By

LTS

Organizing/Related Departments

LTS

Organization Type

Others

WebinarCategory

Technical

WebinarLevel

All (State/Province/Region, National & International)

Related Industries

Location

Online Event

A Data Lake is an architectural pattern rather than a specific platform, built around big data repositories and using a schema-on-read approach. A data lake stores large amounts of unstructured data in object storage like Amazon S3 without structuring the data in advance, while maintaining the flexibility to apply more ETL and ELT to the data in the future. This is ideal for companies that need to analyze constantly changing data or very large data sets.

A data lake architecture is simply a combination of tools used to build and operationalize this type of data approach, from event processing tools to ingestion and transformation pipelines to analytics and query tools. As you can see in the examples below, there are many different combinations of these tools that you can use to build your data lake based on the specific skills and tools available in your organization.

Design Principles and Best Practices for Building Data Lakes

Design principles and best practices are detailed elsewhere. You can check the link to dig deeper. This article briefly discusses the 10 most important factors in building a data lake. This approach has many advantages such as reduced costs, retrospective hypothesis testing, and tracking of problems in the processed data.
Open file format storage: Data lakes should store data in open formats such as Apache Parquet, persist historical data, and use a central metadata repository. This enables ubiquitous access to data and reduces operational costs.
Performance optimization: Ultimately, you should use data in the lake. Store data in a way that can be easily queried by using a columnar file format and keeping files at a manageable size. We also need to partition the data efficiently so that queries can only retrieve relevant data.

Schema Visibility: Ingested data should be understood in terms of schema, sparse fields, and metadata properties for each data source. Gaining this visibility while reading, rather than trying to get it while writing, can help you build your ETL pipeline based on the most accurate and available data, thus avoiding many problems later.

Registration Fees

Free

Registration Ways

Website

Address/Venue

Online 201, Tower S4, Phase II, Cybercity, Magarpatta Township, Hadapsar, Pune, Maharashtra Pin/Zip Code : 411013

Contact

LTS

Date

October 26 2022
Time

06:30 PM - 07:30 PM

Online

201, Tower S4, Phase II, Cybercity, Magarpatta Township, Hadapsar, Pune, Maharashtra

Building a Cost-Effective Data Lake for BI

Date & Time

Venue/Address

EventsGet

More Links

View By Month

Events Info

Newsletter

Social Links