---
title: "sentiment"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{sentiment}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
warning = FALSE,
message = FALSE
)
```
## Why two lexicons?
aesopR ships with two sentiment-joined datasets so you can run sentiment workflows immediately (and reproducibly) without triggering downloads or interactive prompts:
- Bing: categorical labels (positive, negative)
- AFINN: numeric scores (e.g., -5 to +5)
These lexicons are bundled inside tidytext, which makes them lightweight for teaching and safe for non-interactive environments (like automated checks).
```
library(aesopR)
```
A quick refresher: what’s in the sentiment datasets?
- aesops_bing
- aesops_bing
- aesops_afinn
- aesops_afinn
## Sentiment summaries across the entire corpus
Bing: overall counts
```
table(aesops_bing$sentiment)
```
AFINN: overall score distribution
```
summary(aesops_afinn$value)
```
*Teaching note: Ask students what it would mean for a fable to be “positive” or “negative.” Is sentiment the same thing as a moral tone? Often, not.*
## Which fables are “most positive” or “most negative”?
Bing: net sentiment per fable (positive minus negative)
```
bing_tab <- with(
aesops_bing,
table(fable_id, sentiment)
)
# Some fables may not have both labels present; guard with checks
pos <- if ("positive" %in% colnames(bing_tab)) bing_tab[, "positive"] else rep(0, nrow(bing_tab))
neg <- if ("negative" %in% colnames(bing_tab)) bing_tab[, "negative"] else rep(0, nrow(bing_tab))
bing_net <- pos - neg
bing_net <- sort(bing_net)
# Most negative
head(bing_net, 10)
# Most positive
tail(bing_net, 10)
```
AFINN: mean score per fable
```
afinn_mean <- tapply(aesops_afinn$value, aesops_afinn$fable_id, mean)
afinn_mean <- sort(afinn_mean)
# Lowest mean scores
head(afinn_mean, 10)
# Highest mean scores
tail(afinn_mean, 10)
```
### Discussion prompts:
- Do Bing and AFINN “agree” on the extremes?
- What might explain disagreement?
- Are fables with “negative” language necessarily pessimistic?
## A story-driven deep dive: “The Fox and the Grapes”
Bing: which words are driving the sentiment?
```
fox_bing <- aesops_bing |> filter(fable_id == "005")
table(fox_bing$sentiment)
# Most frequent negative/positive words in this fable
sort(table(fox_bing$word[fox_bing$sentiment == "negative"]), decreasing = TRUE)[1:10]
sort(table(fox_bing$word[fox_bing$sentiment == "positive"]), decreasing = TRUE)[1:10]
```
AFINN: strongest scored words
```
fox_afinn <- aesops_afinn |> filter(fable_id == "005")
summary(fox_afinn$value)
afinn_ordered <- fox_afinn[order(fox_afinn$value), c("word", "value")]
head(afinn_ordered, 8)
tail(afinn_ordered, 8)
```
### Teaching note:
Invite students to read the moral and discuss whether “sentiment” captures:
- the moral lesson
- the emotional arc
- the narrator’s perspective
- irony/rationalization (“sour grapes”)
## Extending to other lexicons (optional)
You can extend sentiment or emotion analysis using additional lexicons in the tidytext/textdata ecosystem.
*Important practical note: Some lexicons (e.g., NRC) may require textdata to download them on demand. This can trigger interactive prompts in non-interactive environments (e.g., automated checks). For this reason, aesopR ships with Bing and AFINN only, and treats other lexicons as an optional extension.*
### Example: using another tidytext lexicon (if available)
This is an example pattern (not required for aesopR to work).
Some lexicons will be immediately available; others may require textdata.
```
if (requireNamespace("tidytext", quietly = TRUE)) {
# e.g., "loughran" is available via tidytext::get_sentiments()
# (designed for financial contexts; included here to illustrate extension)
lex <- tidytext::get_sentiments("loughran") |>
head(lex)
}
```
If you plan to use lexicons that download through textdata, consider doing so in an interactive session and caching results in your own analysis project, rather than relying on on-demand downloads during automated runs.
## Wrap-up: methods takeaways
These workflows are intentionally simple so they can serve as teaching tools.
Key methods questions to emphasize:
- What does the lexicon measure?
- What does it not measure (context, negation, sarcasm, narrative framing)?
- How do pre-processing choices change results?
- What would validation look like (human coding, triangulation, sensitivity checks)?