Data Science is all about breaking new ground to enable businesses to answer their most urgent questions. Pioneering massively parallel data-intensive analytic processing, our mission is to develop a whole new approach to generating meaning and value from petabyte-scale data sets and shape brand new methodologies, tools, statistical methods and models.
What’s more, we are in collaboration with leading academics, industry experts and highly skilled engineers to equip our customers to generate sophisticated new insights from the biggest of big data.
Join us to do the best work of your career and make a profound social impact as a Principal Data Engineer - ML Ops on our Data Engineering team in Brazil (Remote).
What you’ll achieve
This position will merge DevOps, software engineering, data science, and data engineering to deploy machine learning models that can optimize business and customer experiences at scale.
You will use your technical knowledge and software development to build technology-centric solutions that accelerate the development of Artificial Intelligence and Machine Learning capabilities across the company.
You will take out the debate of what can be launched and when. The models that we build are essential ingredients to supply our customers & sales with relevant product recommendations and creative breakthroughs to our business.
You & your team are critical for our success!
You will:
Develop high availability and highly scalable applications which will be used by both internal and external customers;
Feature engineering : Stitching together and aggregation of multiple large data sets working with the data scientist to a desired format;
You will need to optimize both queries and architecture to support big data sets;
Data pipelining: Once the initial dataset are prepared, it is normally run through a further pipeline to prepare it for modelling;
Here you will create features that make Machine Learning / Artificial algorithms work (e.g. translating text into category variables);
Expect to work with datasets with billions of rows and thousands of columns;
Adhere to DevSecOps practices to protect underlying data / infrastructure assets;
Utilize a range of applicable technologies across the entire Model Lifecycle (e.g., data science packages, statistical and machine learning techniques, distributed computing, Big Data, CI / CD).