In the current role I am working on hydrating a data lake with supply chain data coming from various data sources.
Apart from the development tasks, I am actively working on designing the future data architecture, so to enable advanced analytics workloads, as well as traditional reporting.
Some services/tools I used: Lambda, Glue, Step Functions, Batch, SNS, SQS, DynamoDB, RDS, EC2, Docker, CodeStar, DMS, CDK, Cloudformation, Microsoft PowerAutomate.
Working at Cognizant Softvision have brought me the chance to do the shift towards cloud technologies and DevOps tools, such as AWS and Terraform.
The project day to day tasks required development and maintenance of data pipelines that were meant to ingest and process data from relational databases, streaming data sources and flat files into an enterprise datalake. Lambda, Glue, DynamoDB, DAX, Kinesis Firehose, CodePipeline were the bread and butter, while the CI/CD infrastructure was managed with Terraform.
Worked in close contact with the infrastructure team, by spotting bugs and suggesting new features to existing code base. Developed and maintained code according to domain best practices (Unit tests, SemVer, SOLID).
Apart from the development tasks, I have supported the operational team in their tasks, by debugging and tuning PySpark intensive data processing jobs on top of Palantir Foundry.
Along with this, I have worked on developing a Django internal web app.
On the organisational side, I have conducted interviews and assessed candidates for the Big data team.
I was responsible for multiple clients of the company.
My daily tasks included building traditional data warehouses from scratch, extending and maintaining existing ones, frontend applications, ETL flows, support client’s developers, testing and software documentation.
Based on client’s technologies/methodologies, I made use of: Microsoft BI suite (SSIS, SSAS, SSRS, SQL Server, PowerBI), Oracle databases, Oracle APEX, SAP Data Services, MongoDB, Git, DataVault, dimensional modelling.
The data stored in SAP HANA databases was prepared accordingly to business rules with R scripts and ETL mappings created in Informatica PowerCenter.
I was responsible for the ETL processes of Vendor Master Data department.
My main duty was to develop, maintain and improve data quality rules in Informatica PowerCenter suite, altogether with data preparation, required for Tableau dashboards, for which I’ve been partly involved in sketching them.