library(gt)
og_df <- read.csv("/Users/hannahusadi/Downloads/AI_Use_by_Companies S&P500/core.csv")
#colnames(data)
# Check the data type of each column
#str(og_df)
I found my data on the Emerging Technology Observatory (ETO) website through a “Data is Plural” newsletter. The Private-Sector AI Indicators dataset tracks AI-related research, patents, and hiring across global companies, using proprietary methods from ETO and CSET (Center for Security and Emerging Technology) to analyze diverse data sources. The data was collected in May of 2024.
What does the dataset look like?
The data has 691 rows, representing company observations, and 63 columns, covering various variables. The variables are focused on both qualitative and quantitative metrics, including company metadata, research, workforce, and patents.
# Count data types
data_types <- sapply(og_df, class)
summary_table <- table(data_types)
# Create a summary sentence
summary_sentence <- paste(
"The dataset contains", ncol(og_df), "columns, including",
paste(summary_table, names(summary_table), collapse = ", "), "variables."
)
# Print the summary
cat(summary_sentence)
The dataset contains 63 columns, including 15 character, 44 integer, 4 numeric variables.
#Filter for s&p 500 only:
df <- og_df %>% filter(str_detect(Groups, "S&P 500"))
That’s a lot of variables! The data, unorganized, is like a ball of yarn with lots of threads and colors.