---
title: "sentiment"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{sentiment}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  warning = FALSE,
  message = FALSE
)
```

## Why two lexicons?

aesopR ships with two sentiment-joined datasets so you can run sentiment workflows immediately (and reproducibly) without triggering downloads or interactive prompts:

-   Bing: categorical labels (positive, negative)

-   AFINN: numeric scores (e.g., -5 to +5)

These lexicons are bundled inside tidytext, which makes them lightweight for teaching and safe for non-interactive environments (like automated checks).

```         
library(aesopR)
```

A quick refresher: what’s in the sentiment datasets?

-   aesops_bing

-   aesops_bing

-   aesops_afinn

-   aesops_afinn

## Sentiment summaries across the entire corpus

Bing: overall counts

```         
table(aesops_bing$sentiment)
```

AFINN: overall score distribution

```         
summary(aesops_afinn$value)
```

*Teaching note: Ask students what it would mean for a fable to be “positive” or “negative.” Is sentiment the same thing as a moral tone? Often, not.*

## Which fables are “most positive” or “most negative”?

Bing: net sentiment per fable (positive minus negative)

```         
bing_tab <- with(
  aesops_bing,
  table(fable_id, sentiment)
)

# Some fables may not have both labels present; guard with checks
pos <- if ("positive" %in% colnames(bing_tab)) bing_tab[, "positive"] else rep(0, nrow(bing_tab))
neg <- if ("negative" %in% colnames(bing_tab)) bing_tab[, "negative"] else rep(0, nrow(bing_tab))

bing_net <- pos - neg
bing_net <- sort(bing_net)

# Most negative
head(bing_net, 10)

# Most positive
tail(bing_net, 10)
```

AFINN: mean score per fable

```         
afinn_mean <- tapply(aesops_afinn$value, aesops_afinn$fable_id, mean)
afinn_mean <- sort(afinn_mean)

# Lowest mean scores
head(afinn_mean, 10)

# Highest mean scores
tail(afinn_mean, 10)
```

### Discussion prompts: 

-   Do Bing and AFINN “agree” on the extremes?

-   What might explain disagreement?

-   Are fables with “negative” language necessarily pessimistic?

## A story-driven deep dive: “The Fox and the Grapes”

Bing: which words are driving the sentiment?

```         
fox_bing <- aesops_bing |> filter(fable_id == "005")
table(fox_bing$sentiment)

# Most frequent negative/positive words in this fable
sort(table(fox_bing$word[fox_bing$sentiment == "negative"]), decreasing = TRUE)[1:10]
sort(table(fox_bing$word[fox_bing$sentiment == "positive"]), decreasing = TRUE)[1:10]
```

AFINN: strongest scored words

```         
fox_afinn <- aesops_afinn |> filter(fable_id == "005")
summary(fox_afinn$value)

afinn_ordered <- fox_afinn[order(fox_afinn$value), c("word", "value")]
head(afinn_ordered, 8)
tail(afinn_ordered, 8)
```

### Teaching note:

Invite students to read the moral and discuss whether “sentiment” captures:

-   the moral lesson

-   the emotional arc

-   the narrator’s perspective

-   irony/rationalization (“sour grapes”)

## Extending to other lexicons (optional)

You can extend sentiment or emotion analysis using additional lexicons in the tidytext/textdata ecosystem.

*Important practical note: Some lexicons (e.g., NRC) may require textdata to download them on demand. This can trigger interactive prompts in non-interactive environments (e.g., automated checks). For this reason, aesopR ships with Bing and AFINN only, and treats other lexicons as an optional extension.*

### Example: using another tidytext lexicon (if available)

This is an example pattern (not required for aesopR to work).

Some lexicons will be immediately available; others may require textdata.

```         
if (requireNamespace("tidytext", quietly = TRUE)) { 
  # e.g., "loughran" is available via tidytext::get_sentiments() 
  # (designed for financial     contexts; included here to illustrate extension) 

lex <- tidytext::get_sentiments("loughran") |> 
head(lex) 
}
```

If you plan to use lexicons that download through textdata, consider doing so in an interactive session and caching results in your own analysis project, rather than relying on on-demand downloads during automated runs.

## Wrap-up: methods takeaways

These workflows are intentionally simple so they can serve as teaching tools.

Key methods questions to emphasize:

-   What does the lexicon measure?

-   What does it not measure (context, negation, sarcasm, narrative framing)?

-   How do pre-processing choices change results?

-   What would validation look like (human coding, triangulation, sensitivity checks)?