NLP and Transformer Models
Description
Here, I focused on natural language processing and transformer models. I first evaluated GPT‑2 models of different sizes on the CoLA dataset for grammatical acceptability, showing that larger models performed better and that averaging log probabilities gave more reliable results. I then compared GPT‑2 with BERT, finding that BERT consistently outperformed GPT‑2 thanks to its bidirectional architecture, even without fine‑tuning.
Finally, I fine‑tuned BERT large on CoLA for sequence classification, which significantly boosted performance and achieved about 83.6% accuracy. My conclusion was that while model size helps, architecture is even more important, and fine‑tuning makes BERT the most effective approach for this task.
Finally, I fine‑tuned BERT large on CoLA for sequence classification, which significantly boosted performance and achieved about 83.6% accuracy. My conclusion was that while model size helps, architecture is even more important, and fine‑tuning makes BERT the most effective approach for this task.
PDF Preview
Project Files
Loading project files…