4 C.elegans plate experiment

In this chapter, I investigate the effect of various substances and concentrations on the number of offspring produced by C. elegans. I conduct a dose-response analysis and utilize ggplot2 for visualizing the results. Additionally, I inspect and normalize the Excel data to ensure accurate analysis. My objective is to gain insight into how different compounds and concentrations influence the number of offspring. To demonstrate my proficiency in working with datasets, I have imported the data and created graphs. This dataset was provided by J. Louter (INT/ILC).

4.1 Importing and inspecting the data

Right off the bat, I noticed ‘ControlVehicleA’ in the experimental conditions, which seems similar to ‘ControlPositive’ but with less ethanol. This likely indicates dilution of experimental compounds in ethanol. There are additional sheets containing input lists, change logs, and examples. I’m unsure about the difference between the ‘ExpLookup’ and ‘Input example’ sheets.

The experiment tested three compounds: 2,6-diisopropylnaphthalene, decane, and naphthalene. The positive control used a solution of 1.5% ethanol in S-medium, while the negative control used only S-medium without any added compound.

First, the Excel file is read in using the {readxl} package:

# Set the file path to the Excel file
excel_data_elegans <- read_excel(
  here::here("Raw_data/celegans/CE.LIQ.FLOW.062_Tidydata (1).xlsx"))

The first step involves examining the data. The information in the table below is dynamically presented, courtesy of the “reactable()” function. By setting “scrollx” to true, you can seamlessly navigate through all the data, regardless of the dataset’s size.

# Load the data in a datatable format
reactable(excel_data_elegans, defaultPageSize = 5, compact = TRUE)

We now inspect the data types of the columns RawData, compName, and compConcentration in the dataset.

# Inspect the datatypes of the RawData, compName, expType and compConcentration columns 
excel_data_elegans %>% select("RawData", 
                           "compName", 
                           "compConcentration", 
                           "expType") %>% map(class)

## $RawData
## [1] "numeric"
##
## $compName
## [1] "character"
##
## $compConcentration
## [1] "character"
##
## $expType
## [1] "character"

4.2 Checking and correcting the datatypes

The expected data types based on the experimental description are:

RawData: numeric (number of offspring)
compName: factor (name of the compound)
compConcentration: numeric (concentration of the compound)
expType: factor (experiment type)

After examining the data, I found that the RawData column is numeric, compName is character, and compConcentration is mistakenly interpreted as character instead of numeric. Additionally, compName and expType should be factors, not character class:

# Converting column compConcentration to numeric
excel_data_elegans$compConcentration <- as.numeric(excel_data_elegans$compConcentration)

# Converting compName to a factor 
excel_data_elegans$compName <- as.factor(excel_data_elegans$compName)

# Converting ExpType to factor
excel_data_elegans$expType <- factor(excel_data_elegans$expType, levels = unique(excel_data_elegans$expType))

# Checking data type of compConcentration column
excel_data_elegans %>% select(compConcentration, compName, compConcentration, expType) %>% map(class)

## $compConcentration
## [1] "numeric"
##
## $compName
## [1] "factor"
##
## $expType
## [1] "factor"

# check levels of "compName" and "expType"
excel_data_elegans %>% select(
  compName, expType) %>% map(levels)

## $compName
## [1] "2,6-diisopropylnaphthalene" "decane"
## [3] "Ethanol"                    "naphthalene"
## [5] "S-medium"
##
## $expType
## [1] "experiment"      "controlPositive" "controlNegative" "controlVehicleA"

4.3 Scatterplot with log10 and jitter

The following plot in figure 5.1 illustrates the relationship between the concentrated compounds and the raw data counts. To provide a clearer view, a log10 transformation has been applied to the x-axis, aiding in better visualization of data variability. Additionally, jittering has been applied to reduce point overlap, enhancing the visibility of individual data points. These adjustments improve the readability of the plot and aid in identifying patterns or trends between the concentrated compounds and the raw data counts.

library(ggplot2)

# Plot the data in a scatterplot
scatterplot1 <- ggplot(excel_data_elegans, aes(x = log10(compConcentration + .001), y = RawData)) +
  geom_point(aes(color= compName, shape= expType), 
             size= 1.3, 
             alpha= 0.8, 
             position= position_jitter(width = 0.1), 
             na.rm = TRUE) +
  labs(x = "Log10 of Compound Concentration in nM", y = "Offspring Counts",
       title = "Mean offspring C. elegans count with compound concentrations") +
  theme_minimal() + 
  theme(legend.position = "right", # Making the plot visually pleasing
axis.title = element_text(size = 12, face = "bold"),
        axis.text = element_text(size = 10),
        legend.text = element_text(size = 10),
        legend.title = element_text(size = 12, face = "bold"),
        plot.title = element_text(size = 14, face = "bold"),
        panel.background = element_rect(fill = "white"))+
   scale_color_brewer(palette = "Set1")

# Converteer naar plotly-object
scatterplot1_plotly <- ggplotly(scatterplot1)

# Toon de plotly-object
scatterplot1_plotly

Figure 4.1: Scatterplot with trendlines illustrating the average offspring count of C.elegans across three different compounds at diverse concentrations.

Positive and negative conditions

The positive control for this experiment (controlPositive) entails exposing the C. elegans nematodes to a known substance or condition that promotes reproduction, such as a favorable nutrient source or a substance known for its reproduc tive effects on C. elegans.In this experiment, that would be ethanol.

The negative control for this experiment (controlNegative) involves exposing the nematodes to an environment or substance that has no effect on their reproduction, such as a neutral or untreated control solution without any bioactive substances. In this experiment that would be S-medium.

4.4 Statistical analysis

To analyze the effect of compound concentrations on offspring count and identify differences among compounds’ response curves (IC50), I’ll perform a stepwise statistical analysis. I’ll start by loading the dataset and segmenting the data to isolate concentrations for individual compounds. Then, I’ll visualize compound concentrations and assess data normality and variability. Following this, I’ll conduct an analysis of variance (ANOVA) test to detect significant differences among groups. If significant, I’ll use post hoc tests to identify specific group variations. Finally, based on the statistical findings, I’ll draw conclusions about the impact of compound concentrations.

4.5 Normalizing the ‘controlNegative’ data and plotting

To gauge the relative improvement or decline in offspring count compared to the baseline (the negative control), it’s necessary to normalize the data. This entails adjusting the negative control to a value of 1 and scaling the remaining values accordingly. For this dataset, dividing by 85.9 achieves this normalization. Subsequently, a revised scatterplot (see figure 5.2) was generated using the newly normalized data, maintaining the previous settings.

# Filter the data for controlNegative
controlNeg <- excel_data_elegans %>% filter(expType == "controlNegative")

# Calculate the mean of RawData for controlNegative
controlNeg_mean <- mean(controlNeg$RawData) # outcome: 85.9

# Normalize RawData by dividing each value by the mean of controlNegative
excel_data_elegans <- excel_data_elegans %>% 
  mutate(normalized = RawData / mean(controlNeg_mean))

# Calculate the mean of RawData after normalization
controlNegative_mean <- mean(excel_data_elegans$RawData)
controlNegative_mean

## [1] NA

# Display selected columns with normalized data
excel_data_elegans %>% 
  select(compName, expType, compConcentration, RawData, normalized) %>% 
  reactable(defaultPageSize = 5)

Figure 4.2: The positive control for this experiments is controlPositive. The negative control is controlNegative

# Rerun the graph using the normalised data 
scatterplot_normalised <- ggplot(excel_data_elegans, aes(x = log10(compConcentration + .001), y = normalized)) +
  geom_point(aes(color= compName, shape= expType), 
             size= 1.3, 
             alpha= 0.8, 
             position= position_jitter(width = 0.1), 
             na.rm = TRUE) +
  labs(x = "Log 10 compound concentration in nM", y = "Offspring Counts",
       title = "Mean offspring C. elegans count with compound concentrations") +
  theme_minimal() + 
  theme(legend.position = "right", # making the plot visually pleasing
axis.title = element_text(size = 12, face = "bold"),
        axis.text = element_text(size = 10),
        legend.text = element_text(size = 10),
        legend.title = element_text(size = 12, face = "bold"),
        plot.title = element_text(size = 14, face = "bold"),
        panel.background = element_rect(fill = "white"))+
   scale_color_brewer(palette = "Set1")

# Converteer naar plotly-object
scatterplotnorm_plotly <- ggplotly(scatterplot_normalised)

# Toon de plotly-object
scatterplotnorm_plotly

Figure 4.2: The positive control for this experiments is controlPositive. The negative control is controlNegative

Normalizing the data for ‘controlNegative’ so that the average value is exactly 1, and all other values are expressed as a fraction of it, offers several advantages:

It ensures comparability by establishing a fixed reference point for assessing the effects of other compounds.
It eliminates scaling effects, preventing the interpretation of results from being influenced by differences in measurement units or experimental conditions.
Normalizing the data in this manner also simplifies interpretation, as values higher than 1 indicate an increase in effect compared to the control, while values lower than 1 indicate a decrease.
Re-running the graphs with normalized data enables easier visualization and comparison of the effects of different compounds, facilitating the identification of trends and patterns.