Krishna Kartik Darsipudi

Python | Data Science | Data Engineering

Profile

Automation Engineer with a flair for Data Science


About me

I am a master's student at NYU Center for Data Science with experience in data pipelining, handling Kubernetes/VMware/Docker infrastructures and creating dashboards. My current interests lie in the intersection of data engineering and data science.

Krishna Kartik Darsipudi

Details

Name:
Krishna Kartik Darsipudi
Location:
New York, USA

Experiences


Education

New York University

Sep 2021 - May 2023

Master of Science in Data Science Responsible Data Science - Fairness as equality of opportunity, Disparate Treatment, Disparate Impact, Pre-existing/ Technical/ Emergent Bias in computer systems, Differential Privacy, Synthetic Data, Query Sensitivity
Machine Learning - Gradient Descent, Regularization, Support Vector Machines, Bayesian Methods, Random Forest, Adaboost, Gradient Boosting
Big Data - Relational Databases, Map-Reduce, Hadoop Distributed File Systems, Spark, Column Oriented Storage, Dask, Similarity Search, Recommender systems
New York, USA

Vellore Institute of Technology

Jul 2015 - Apr 2019

B.Tech in Computer Science and Engineering Vellore, India


Career

Dell Technologies

June 2022 - Aug 2022

Graduate Intern

  • Explored data virtualization to integrate data sources and allow their querying as a single data lake
  • Automated install of Trino, a data virtualization tool on a 3 node Kubernetes cluster using helm and docker
  • Helped the team justify use of the enterprise version of Trino as compared to the open source version
Remote, USA

New York University

Feb 2022 - May 2022

Graduate Research Assistant

  • Exploratory analysis of over 500,000 job postings scraped from Indeed.com to build a recommendation engine through topic modeling techniques applied to job descriptions
  • Combined generated topics with unique skills from job descriptions to generate apriori association rules
  • Devised unittests for all custom python libraries and functions
New York, USA

Texas Instruments

Jul 2019 - Aug 2021

VMWare Administrator/Automation Engineer

  • Built a python and Oracle SQL data pipeline to collect iCinga and SCOM monitoring data from 20,000 servers and predicted unhealthy areas in a multi-OS environment visualized using a dashboard in Splunk
  • Expedited Bi-annual and Monthly server OS patching through python and SQL Server by automating schedules, approvals and asynchronously applying patches via Cherwell Rest API and saved 2000 hours/yr
  • Optimized resource utilization by creating a single pane view dashboard in Spotfire with a python and Oracle SQL ETL pipeline that cleaned, pre-processed and collated data from productivity tools like JIRA, Cherwell, Sharepoint
  • Configured converged infrastructures for Autostore Product Distribution Centers using VMWare vCenter and vRealize Operations tools
Bangalore, India

Texas Instruments

Jan 2019 - Jun 2019

IT Intern

  • Automated creating custom URL abbreviations via a self-service portal called ShortURL
  • Migrated 10,000 URL abbreviations from Windows 2008 to Windows 2012 Operating Systems
Bangalore, India

Technical Skills


Langauges

  • Python
  • Powershell
  • C/C++
  • VB
  • ASP.NET
  • R

Database and Big Data Tools

  • SQL Server
  • Oracle
  • MYSQL
  • MariaDB
  • PostgreSQL
  • Hadoop
  • Spark
  • Map Reduce

Developer Tools

  • Git
  • Github
  • Bitbucket
  • VSCode
  • Windows Subsystem for Linux
  • PyCharm

Infrastructure Tools

  • Docker
  • VMware
  • Spotfire
  • Splunk
  • JIRA
  • Rest API
  • iCinga
  • Cherwell
  • Kubernetes
  • Apache Superset
  • SCOM

Projects


New York University

Spring 2022

Movie Recommender System on MovieLens Dataset

  • Built an alternative least squares recommendation model with a testing precision of 0.039
  • Performed a qualitative analysis of the model by visualizing the learned item and user representations via UMAP on a dataset with 27 million movies

New York University

Spring 2022

Nutritional Label for Patient Survival Prediction

  • Built an interpretability tool to validate an automated decision system (ADS) that predicts patient survival with an AUC score of 0.9
  • Analyzed the dataset with 130000 ICU visits for missing information and biases wrt protected categories
  • Advised against the use of this ADS due to the high number of false negatives
  link to code

New York University

Fall 2021

Analysis of NYC Airbnb Listings

  • Summarised the causal relationship between features of listings such as prices and ratings using t-tests
  • Examined the correlation between the location of listings and their price
  • Built and tuned a Random Forest regression model to predict price of listings with an 𝑅 value of 0.83 2

Vellore Institute of Technology

Spring 2018

Predicting Football Match Winner

  • Predicted football match winner using Recurrent Neural Networks and LSTM cells with a 90% accuracy
  • Presented a paper at the Science Engineering and Technology conference at VIT, India
  link to code

Vellore Institute of Technology

Spring 2018

Online News Popularity

  • Analysed Gradient Boosting Machine, Random Forest and Xgboost to interpret the popularity of a news article
  link to code

Vellore Institute of Technology

Fall 2017

Influential User Detection in Twitter

  • Used the twitter api to get real-time tweets and find their influence by calculating their spread of communication
  link to code

Contact