Tue 4 February 2020 | 6:00 pm - 9:00 pm
Rival 3 - Rival 3, Tel Aviv,
Have you ever wanted to hear about practical solutions to scale problems from the companies that deal with big data for breakfast?
Join us for the first *free* meetup in the series about Scale Performance and Data.
Our talented engineers will share with you how they created creative implementations to common problems when the out of the box solutions couldn’t cope up with the scale.
4.2.20 | 3 Rival st Tel Aviv | 2nd floor
*Parking is available around the block (with Pango, free from 19:00)
Scaling up Data Lake infrastructure with CDC and Hive
As systems become more complex, the necessity for more solutions to centralize, monitor, maintain, and query big data – without interfering with operational databases, becomes increasingly important. At Yotpo, we work with a wide range of microservices and databases, so the need to transfer data and expose it within a centralized data lake is crucial.
We wanted to have near real-time access to our databases which would enable us to run analytics (e.g. Apache Spark jobs), monitor changes in data, search index, measure data quality, and trigger event-based actions within our ecosystem.
The concept of Change Data Capture (CDC) architecture is to track changes in data so that actions could be taken using the changed data. In this session, Irena Reznikov Levi, Data Engineer @Yotpo, will be talking about how we implemented the CDC solution, along with data discovery tools, to create a fresh and well-documented data lake at Yotpo.
What Kind of Data do Marketers Actually Use?
You want to share data with your marketing department, but how do you know what’s actually useful to them? In this talk, Aliza Polkes, Copywriter & Editor @Yotpo, will explain what kind of data can be used in marketing campaigns, and how data engineers and marketers can work together to understand the others’ needs.
Scaling the build @Taboola
Taboola rolls out a major release every day, and deploying it on 7 data centers. We deal with 350 builds a day, 10.4M lines of code, 40K unit tests to run, and 45 deployments of unique services a day. In order to scale up the build from one maven execution on a single machine taking ~25 hours to build jars, rpms, dockers and run all unit tests. Roy Arnon, Tech lead @Taboola will share how he created a scaled build system, including maintaining an impressive build data lake in BigQuery, for monitoring and tracking.
Hope to see you there!