LLM System Development
The goal of this project was developing a chatbot which utilized a Large Language Model which
would replace an older chatbot system. The work on this project included analyzing and preparing
datasets for fine-tuing LLMs, fine-tuinging and evaluatign LLMs, prompt engineering, backend
development with FastAPI, continerization with Docker, setting up a vector database for
Retreival Augmented Generation, setting up a MongoDB for keeping historical context, and using
zero-shot classification transformers for evaluating the LLM output.
Chatbot Development
The goal of this project was developing a chatbot which would lessen the workload of human
agents doing the same job as the bot. The project included analyzing KPIs with Tableau and SQL,
analyzing textual data with Python, as well as fine-tuning Transformer Models.
Time Series Anomaly Detection
The goal of this project was to develop a model for detecting anomalies in time series data,
which would be used in a monitoring system. The project included analyzing and visualizing data
with Pandas, Statsmodel and Seaborn, and building ML Models for anomaly detection using
Scikit-learn.
Detecting Malware in Android Applications using XGBoost
Paper available here
The omnipresence of Android devices and the amount of sensitive information kept in them makes
detecting malware in Android applications crucial. In this paper, the efficacy of using machine
learning models for the purpose of malware detection in Android applications was examined, and
several XGBoost models were developed and compared - each with a distinct feature set. We used
the f1 score, precision, recall, confusion matrices, and precision-recall curves to compare the
models. Accuracy was not considered since we needed a balanced dataset. One of the models we
developed, which used all the available features in the dataset, had encouraging results with
high precision and recall.
DistilBERT and RoBERTa Models for Identification of Fake News
Paper available here
The goal of this project was to fine-tune two transformer models, namely DistilBERT and RoBERTa,
and compare their effectiveness in fake news detection. Both models were trained on a labelled
dataset of news articles and evaluated on two datasets, comparing their performance in terms of
accuracy, precision, recall and F1-score. The results of the experiments showed that both models
perform well, with RoBERTa achieving slightly better results overall. This project resulted in a
paper that was published at the MIPRO Convention in Croatia.
logs2graphs: Data-driven graph representation and visualization of log data
Paper available here
The goal of this project was to develop a system which could create graph representations of
system logs. These graph representations could then be used in logs anomaly detection, logs
prediction, and root cause analysis guided by logs. Therefore in this paper, we present
logs2graphs, an open-source system for the creation and visualization of such graph
representations of log messages, which is compatible with several publicly available log sources
and expandable to other log sources.