~$ init portfolio — Jan 2026

Building
enterprise-scale
Data & AI systems.

Data & AI Engineer with 7+ years shipping production platforms on AWS, Azure & GCP. Snowflake · Databricks · LLMs · Agentic AI · Real-time streaming — end to end.

Talk to my AI Assistant (!Warning: I'm not very smart)
Hey! I'm Arijit's AI assistant. Ask me anything about my experience, skills, projects, or background.
location bangalore, india
github arijitroy003
linkedin sudo-kill

experience

4 roles

Enterprise platforms, GenAI products, and high-scale data engineering across fintech, e-commerce & SaaS.

Red Hat Apr 2024 — Present
Senior Software Engineer — Data & AI
Bengaluru, India · Platform Engineering
  • Built self-service GitOps data mesh platform (Snowflake, dbt, Fivetran, Airflow, K8s) — migrated from legacy Redshift/Starburst, achieving $100k+ annual cost reduction.
  • Implemented release automation & CI tools saving 1000+ lead-engineer hours annually; deployed 100+ ASCA & PIA compliant data products with cost/usage monitoring.
  • Championed data governance with Atlan; drove full 0→1→10 product lifecycle from technical enablement through production adoption.
  • Collaborated on MCP servers for Data Analytics Agents, Agentic AI frameworks, and AI observability tooling with Langfuse.
GitOpsSnowflakedbt-CoreAWSGolangKubernetesPythonFivetranAirflowAtlanGitLab CI/CD
Beem Nov 2023 — Mar 2024
Senior Data Engineer — Financial Services
Remote, US
  • Built LLM-powered Data & AI platform serving 50M+ users for personal finance management with integrated data governance.
  • Contributed to investor pitches securing $16k Databricks funding with $24k in future commitments.
  • Engineered ETL workflows processing 500 GB daily clickstream & telemetry data.
DatabricksAWS S3PythonMongoDBMixpanelMetabase
Tata Digital (Tata Neu) Aug 2021 — Oct 2023
Senior Software Engineer — E-Commerce & Retail
Bengaluru, India · Strategic Initiatives
  • Built Conversational AI (voice + text) data warehousing platform serving 120M+ users, processing 500M events daily.
  • Developed GenAI product search & recommendation engine using Azure OpenAI, Mistral, LangChain, embeddings, and vector DBs (Chroma, Milvus).
  • Led 6-engineer team migrating to Delta Lake; owned 15+ customer-facing pipelines — reduced latency 75% & cost 80%.
  • Engineered voice-call analysis system using deep learning, speech-to-text, and custom voice ML models.
  • Built API monitoring microservices for AI chatbot across 12 Indic languages with real-time 24×7 dashboards.
PySparkDatabricksDelta LakeAzure OpenAILangChainScalaADLSKusto
Gnosis Lab Jun 2019 — May 2021
Founding Engineer — NASSCOM 10K Startups
Kolkata, India
  • Built AI bot for a SaaS platform automating social media marketing on Instagram & Twitter using serverless architecture.
  • Developed full MEAN Stack application and delivered 50+ backend APIs for a Learning Management System.
PythonAWS LambdaTensorFlowAngularMongoDBDynamoDBDockerOpenCV

education

Jadavpur University
Masters in Computer Applications
Kolkata, IN · Distributed Systems / Cloud Computing
Aug 2018 — Jun 2021
GPA 8.81 / 10
Ramakrishna Mission Vidyamandira
B.Sc. in Computer Science
Howrah, IN
Aug 2015 — May 2018
GPA 8.49 / 10

projects

selected work

Production deployments, research & freelance builds.

[freelance · 2021]
Doctor — Clinical Management Software
Production-ready Android app & web portal for appointment bookings and doctor practice management. Integrated payment processing via Razorpay and analytics via BigQuery.
React NativeFirebaseGCPBigQueryRazorpay
[research · 2017]
Offline File System Search
Document indexer for offline information retrieval built under supervision of Dr. Dwaipayan Roy at the Information Retrieval Lab, Indian Statistical Institute, Kolkata.
PythonNLPLinuxInformation Retrieval
[red hat · 2024–present]
GitOps Data Mesh Platform
Self-service data mesh enabling 100+ compliant data products with automated governance, cost monitoring, and full CI/CD. Delivered $100k+ annual cost savings vs. legacy stack.
SnowflakedbtKubernetesGitOpsAirflow
[tata neu · 2022–2023]
GenAI Product Search Engine
LLM-powered product search & recommendation system using embeddings and vector databases, serving 120M+ users on the Tata Neu e-commerce platform.
Azure OpenAILangChainMilvusChromaMistral
[tata neu · 2021–2023]
Conversational AI Data Platform
Voice & text chatbot data warehousing platform processing 500M daily events. Includes deep-learning voice analysis and real-time monitoring across 12 Indic languages.
PySparkDelta LakeDeep LearningSpeech-to-Text
[beem · 2023–2024]
LLM-Powered Personal Finance Platform
AI-driven finance management product serving 50M+ users. Built end-to-end ETL for 500 GB/day telemetry and contributed to the funding deck securing seed investment.
DatabricksLLMsAWS S3MongoDB

Skills

Core engineering, data infrastructure, AI/LLM, and DevOps

Languages
Cloud
Data
AI/LLM
Databases
Research
Core Languages & APIs
PythonGolangSQLT-SQLJavaScriptTypeScriptShellScalaREST APIsMicroservicesGit
Cloud & Infrastructure
AWSAzureGCPKubernetesDockerTerraformGitOpsCI/CD PipelinesContainer OrchestrationOAuth / SAML 2.0
Data Engineering
SnowflakeDatabricksPySparkDelta LakedbtData MeshAirflowKafkaETL / ELTReal-time StreamingFivetranAtlanData ModelingDataOpsDistributed Compute
AI / LLM Engineering
LangChainVector DBs (Milvus, Chroma, Qdrant)EmbeddingsOpenAIClaudeMistralAzure OpenAIAgentic AIMCP ServersAI ObservabilityMLOpsTensorFlowPyTorchNLP
Databases & Storage
MongoDBDynamoDBNoSQLADLSS3BigQueryFirebaseKusto
Research & Extras
Information RetrievalDistributed SystemsBig DataCloud ComputingDSAOOPTech BloggingChessLinux Ricing

blog

writings

Articles and technical deep-dives.