AI Jobs DB

The fastest and lightest job board for artificial intelligence / machine learning engineers, specialists, and scientists to find a new career

Big Data Architect at SITA

We are looking for a passionate and experienced Big Data engineer with strong Python, Scala, Spark or equivalent language experience to join our expanding data science & AI Center of Excellence(CoE) team consisting of data science and AI experts. The Data Engineer will work on the collecting, storing, processing, and analyzing of huge sets of data. The Data Engineer must also have exceptional analytical skills, showing fluency in the use of platform  and  tools such as AWS architecture, MySQL and strong Python, Spark, Scala, Java and T-SQL programming skills. She/he must also be technologically adept, demonstrating strong computer science skills. The candidate must additionally be capable of developing databases using SSIS packages, pipelines orchestration, T-SQL, MSSQL and Spark scripts.

Your responsibilities will cover: 

As part of the Data Science & AI team, you will be contributing to the development of our data environment through integration and evaluation of a high number of Big data sources internal and external. You will be working closely with both our technology experts as well as our data science experts to build and support the CoE in Data Science and AI operations.

  • Develop and maintain data pipelines;
  • Develop and maintain cloud  data architecture  AWS/Azure
  • Take ownership and responsibility for the quality of data with consideration of efficiency, performance, and cost
  • Design, model, develop and maintain data sets to be used for data science and AI
  • Be responsible for the design and ongoing development of pipelines,ETL,Datalake etc..
  • Assess, recommend and support the implementation of new data technologies
  • Develop and maintain state of the art data models for the CoE leveraging multiple data sources
  • Identify and correct data quality issues and reinforce end to end data governance
  • Evaluate and integrate a variety of data sources including third party data;
  • Analyze, parse and extract and integrate data from structured and unstructured datasets;
  • Gather, process raw data at scale and build the master data foundation for ML and AI projects
  • Create and maintain various datasets using complex data transformation both in batch and real-time modes;
  • Implement event-based and status-based rule-engines;
  • Design and maintain machine learning and AI serving infrastructure;
  • Work closely with internal partners (technology, machine learning and business experts)




EXPERIENCE , KNOWLEDGE & SKILLS:

  • Bachelor or master degree in computer science, or equivalent experience;
  • Design, model, develop and maintain Big datasets to be used for BI Data science and AI
  • Be responsible for the design and ongoing development of pipeline and  ETL
  • Strong experience with cloud-based big data platforms AWS, Azure or  Google Cloud
  • 5+ years of professional experience in data pipeline development orchestration and maintenance;
  • Strong experience in data engineering processes developing scripts to integrate and normalize third-party data using APIs as well as web scraping/crawling (beautifulsoup, scrapy, selenium, etc.);
  • Strong experience with most of the following technologies: Batch processing (Spark or MapReduce), Streaming processing (Spark-Streaming ,or other), Event Processing (Kafka, RabbitMQ, etc.) NoSQL Database (Elastic Search, MongoDB, Cassandra, etc.) Visualization Tool (Power BI , GCP Data Lab etc.) Containerization (Docker), Micro Service architectures, Pipelining frameworks (Luigi and/or Airflow etc.);
  • Strong knowledge of computer science fundamentals: data structures, algorithms, programming languages, distributed systems, and information retrieval;
  • Experience writing automated unit and functional tests;
  • Strong knowledge of UNIX environment including shell scripting;
  • Experience with versioning tools (Git);
  • Experience working in an Agile development environment;
  • Excellent communication skills.
  • Experienced deploying AWS Database, AWS EMR EC2  AWS data Lake and AWS  ML
  • Experience implementing REST API calls and authentication
  • Experienced working with agile project management methodologies

BONUS POINTS: 

  • Experience in the air transportation industry;
  • Experience building data model and data architecture for data science
  • Experience with other programming languages (Java, C, C++);
  • Knowledge of machine learning or deep learning;
  • Spark Programming (AWS Databricks preferable)
  • Python, Java & SQL
  • Knowledge of AWS or Azure  Cloud (Data Platform Technologies)
  • Manage high volume, high traffic GDPR solutions build
  • Strong experience with SCALA or Hive
  • Experience with geospatial, unstructured data (text image video);
  • Experience working with data science teams;

EDUCATION & QUALIFICATIONS

- Degree in a technical discipline (e.g. Computer Science Engineering Mathematics etc.) or sufficient work experience to demonstrate proficiency at this level.

Please let the company know you found this position via aijobsdb.com so we can keep providing you with quality jobs.

by Tsutomu Narushima