Projects


Tokenization in Turkish and Finnish image

Tokenization in Turkish and Finnish

Scraped text in Turkish and Finnish to study tokenization in agglutinative languages. Evaluated using Word2Vec and Named Entity Recognition sets.


A Neural Network Informed Study on the Gay Voice image

A Neural Network Informed Study on the Gay Voice

An ongoing project with CUNY Queens on identifying linguistic features that correspond with both self-reported sexuality and listener-perceived sexuality in Gen Z youth.


Generating Language Families using Word Similarity image

Generating Language Families using Word Similarity

2024 Summer Project for Tufts Engineering with AI Camp: Using the UDHR, ran a comparison using Levenshtein Distance to "classify" language families.