Languages
Python, Java, R, C/C++, PL/SQL
IDE
Anaconda, Eclipse, VS Code, RStudio
Hello I'm
Kpodjro
Data Scientist/ Machine Learning Engineer
Machine Learning Engineer with hands-on experience in developing and optimising large language models (LLMs) and applied AI systems. Currently completing a one-year work-study placement in Data Science and Generative AI, where I design and deploy end-to-end machine learning solutions in a real-world, production-oriented environment.
My background includes strong expertise in data mining, NLP, question–answering systems and language model fine-tuning, achieving a 23% improvement in predictive performance during a recent internship. I have also led research-oriented projects, including:
I am currently pursuing a Master’s degree in Data Science at Université Paris Cité, specialising in machine learning algorithm development, advanced data analysis and decision support systems.
URSSAF Caisse Nationale
October 2025 – September 2026
Technologies:
Python, PyTorch, LangChain, LangGraph, Docker, Kubernetes, GitLab, PostgreSQL, OpenSearch, OVH Cloud
ATTIJARIWAFA BANK
March 2024 - August 2024
Technologies :
Mistral 7b, Langchain, torch, streamlit, VSCode, HuggingFace, MongoDB(NoSQL), Jira
MLView Consulting
August 2023 - September 2023
Technologies :
Langchain, HuggingFace, LLAMA-2, tensorflow, VSCode, Colab, Git&Github, SQL
2024 - 2026
MSc in Machine Learning for Data Science
Paris Cité University, Paris, France
Reinforcement Learning & Graph Learning, Recommendation Systems, Clustering & Dimensionality Reduction, NLP, Data Visualization, Time Series, Mixture Models, Vertex AI (GCP)
Program2021 – 2024
Engineering Degree in Software and Intelligent Systems
Abdelmalek Essaadi University, Tangier, Morocco
Machine Learning, Deep Learning, AI Methodology, Computer Vision, Data Mining, Inferential Statistics, BI, UML, Informed Search Algorithms, Design Patterns
Program2019 – 2021
Associate's Degree in Mathematics, Computer Science & Physics
Hassan 1st University, Settat, Morocco
Algebra, Numerical & Complex Analysis, Statistics, C/C++, Mathematical Optimization, Databases, Graph Theory
Program
Discover my
Research project (in a team of 4 students):
10/2025 - 12/2025
Abstract : This research explores semi-supervised learning under extreme label scarcity. Several approaches are studied and compared, including supervised baselines, CatGAN (MLP and CNN variants), and Regularized Information Maximization (RIM). The results demonstrate that combining CatGAN with convolutional architectures significantly improves classification performance by leveraging unlabeled data and maximizing information content in the discriminator’s predictions.
Main tasks :
Research project (in a team of 4 students) :
10/2024 - 05/2025
Abstract : Enhanced movie recommendations using LLMs (Gemini-1.5, Mistral) to enrich user/item profiles, significantly improving accuracy in LightGCN, MLP, and Matrix Factorization by addressing data sparsity and enabling nuanced personalization. Focused on responsible integration, acknowledging challenges like bias and cost.
Main tasks :
Research project (in a team of 4 students) :
01/2025 - 02/2025
Abstract: This paper explores the use of TreeTagger to accurately identify the different functions of the word "that" in English, such as conjunction, relative pronoun, determiner, or adverb. We first evaluate pre-trained models from the BNC and Penn corpora, then re-train TreeTagger with specific labels derived from the Brown corpus to enhance accuracy. Comparisons with Stanza and UDpipe are presented. The main findings demonstrate that re-training with the Brown corpus significantly improves the tool’s performance and ability to distinguish between various uses of "that".
Main Tasks:
Explore My
Python, Java, R, C/C++, PL/SQL
Anaconda, Eclipse, VS Code, RStudio
Pandas, Numpy, statsmodels, sklearn, Pyspark
sklearn, TensorFlow, Keras, pytorch
Supervised,
Unsupervised, Reinforcement, Ensemble Learning
CNN, RNN, LSTM, ANN, TensorFlow, Keras,GNN
OpenCV, Tesseract, KerasCV, pillow
Bert, KerasNLP, LLMAMA-2, Mistral
Statistical Modeling, Dashboard Development
MySQL, PostgreSQL, MongoDB, Oracle, Hive
HTML5 & CSS3, Streamlit, Flask, FastAPI, Shiny, Angular (Beginner)
scrapy, BeautifulSoup, Selenium, pytrend
Git & GitHub
Docker
Airflow, cron(Linux), AWS, GCP, Vertex AI
Gantt Project, Jira
Scientific Document Preparation
Data Visualization, Dashboard Development
Browse my
Feature extraction (FFT, spectral features) , LLE, TSNE, K-Means, Hierarchical Clustering, Spectral Clustering, Fuzzy KMeans
PySpark, Spark Streaming, MLlib, Apache Beam, Big Data
Python, NLP, Scikit-learn, UMAP, Sentiment Analysis
Sklearn, tensorflow, Keras, matplotlib,SQL
sklearn, seaborn, xgboost, lightgbm
Kafka Stream, PySpark, Sklearn, Flask, Angular, Docker,SQL
Sklearn, OpenCV, flask
Langchain, Streamlit, FAISS, LLMAMA-2
My
Date of issue: 09/2022
Organism : Huawei
Date of issue: 10/2023
Organism : OpenCV University
Date of issue: 10/2023
Organism : Nasa Space Challenge
Date of issue: 09/2023
Organism : Kaggle
Date of issue: 10/2023
Organism : Kaggle
Get in touch
Copyright © 2026 Kpodjro KPATOUKPA. All Rights Reserved.