Setting up the environment#
1. Starting file#
To follow along with this tutorial, you will need access to this website, as well as a single tarball (filename: japan_eDNA_bootcamp_files.tar.gz). The website contains all the code we will be using, while the zipped file contains all the data necessary to successfully execute the code.
To start, place the tarball (filename: japan_eDNA_bootcamp_files.tar.gz) into an empty directory you will use for this workshop. For consistency, let’s name this new folder bootcamp
and place it on your Desktop.

Fig. 7 : The starting screen for this workflow (note: this worflow was developed on MacOS, slight differences might be experienced on Windows or Linux)#
2. RStudio#
2.1 Introduction#
During this course, we will be using the RStudio Desktop application, the most widely-used Integrated Development Environment (IDE) for working with R, to analyse and visualise eDNA metabarcoding data. RStudio provides a powerful and user-friendly interface for:
writing and executing R code
data wrangling
statistical analysis
publication-quality figures
reproducible research
While prior R experience will be beneficial, we aim to keep things simple and start with the essentials!
2.2 The RStudio interface#
The RStudio interface is divided into 4 primary panels, each serving a specific purpose in data analysis. Since we will be using all of them during this bootcamp, let’s quickly introduce each of them.
Source Editor (Top-left)
Purpose: Write, edit, and execute scripts
Key features:
Tabbed documents for
.R
scripts, notebooks, or metadata files and dataframes.Syntax highlighting for debugging.
Run button to execute code line-by-line or in chunks.
Console and Terminal (Bottom-left)
Purpose:
Console: Directly interact with R to execute commands and view raw output.
Terminal: Indirectly interact with the shell environment to access command-line interface (CLI) programs.
Key features:
>
prompt for live coding.Tab-completion for functions and filenames.
Up-arrow to recall previous commands.
Environment and History (Top-right)
Environment:
List all loaded objects, such as dataframes and variables
Click any object to view it as a spreadsheet in the Source Editor panel.
History:
Review all executed commands (can be saved to a script).
Files, Plots, Packages, and Help (Bottom-right)
Files: Navigate project directories.
Plots: Visualise and export graphs as PDF/PNG for publications.
Packages: Check, update, or install R packages needed for analysis.
Help: Search documentation for R packages, essential for troubleshooting.

Fig. 8 : The starting screen for RStudio (note: the RStudio interface is highly customisable, so your version might look slightly different)#
3. Folder structure & Starting files#
3.2 Folder structure#
Once the tarball is unzipped, we can have a look at the folder structure we will use during the workshop. For any project, it is important to organise your various files into subfolders. Throughout this course, we will be navigating between folders to run specific lines of code. It may seem confusing or tedious at first, especially as we will only be dealing with a few files during this workshop. However, remember that for many projects you can easily generate hundreds of intermediary files. So, it is best to start with good practices from the beginning.
As we did earlier, we can use the list.files()
function to print all the files in the directory. We can provide the parameter include.dirs = TRUE
to also print out all the subdirectories within our current working directory.
list.files(include.dirs = TRUE)
Output
> list.files(include.dirs = TRUE)
[1] "0.metadata" "1.raw" "2.trimmed"
[4] "3.filtered" "4.results" "5.refdb"
[7] "6.phylotree" "japan_eDNA_bootcamp_files.tar.gz"
We can see that RStudio prints the list of files and folders in columns to span the width of the Console window. However, sometimes it is easier to print results line-by-line. We can use the writeLines()
function to accomplish this.
writeLines(list.files(include.dirs = TRUE))
Output
> writeLines(list.files(include.dirs = TRUE))
0.metadata
1.raw
2.trimmed
3.filtered
4.results
5.refdb
6.phylotree
japan_eDNA_bootcamp_files.tar.gz
3.3 Starting files#
While most folders are currently empty, as we will populate them with files during our analysis, some folders contain the necessary starting files for the analysis. We can provide one extra parameter to the previous code, i.e., recursive = TRUE
, to print out all subdirectories and files within them.
writeLines(list.files(recursive = TRUE, include.dirs = TRUE))
Output
> writeLines(list.files(recursive = TRUE, include.dirs = TRUE))
0.metadata
0.metadata/sample_metadata.txt
1.raw
1.raw/Akakina_1_S103_R1_001.fastq.gz
1.raw/Akakina_1_S103_R2_001.fastq.gz
1.raw/Akakina_2_S104_R1_001.fastq.gz
1.raw/Akakina_2_S104_R2_001.fastq.gz
1.raw/Akakina_3_S105_R1_001.fastq.gz
1.raw/Akakina_3_S105_R2_001.fastq.gz
1.raw/Asan_1_S108_R1_001.fastq.gz
1.raw/Asan_1_S108_R2_001.fastq.gz
1.raw/Asan_2_S109_R1_001.fastq.gz
1.raw/Asan_2_S109_R2_001.fastq.gz
1.raw/Asan_3_S110_R1_001.fastq.gz
1.raw/Asan_3_S110_R2_001.fastq.gz
1.raw/Ashiken_1_S124_R1_001.fastq.gz
1.raw/Ashiken_1_S124_R2_001.fastq.gz
1.raw/Ashiken_2_S125_R1_001.fastq.gz
1.raw/Ashiken_2_S125_R2_001.fastq.gz
1.raw/Ashiken_3_S126_R1_001.fastq.gz
1.raw/Ashiken_3_S126_R2_001.fastq.gz
1.raw/Edateku_1_S132_R1_001.fastq.gz
1.raw/Edateku_1_S132_R2_001.fastq.gz
1.raw/Edateku_2_S133_R1_001.fastq.gz
1.raw/Edateku_2_S133_R2_001.fastq.gz
1.raw/Edateku_3_S134_R1_001.fastq.gz
1.raw/Edateku_3_S134_R2_001.fastq.gz
1.raw/FieldControl_112122_S106_R1_001.fastq.gz
1.raw/FieldControl_112122_S106_R2_001.fastq.gz
1.raw/FieldControl_112222_S143_R1_001.fastq.gz
1.raw/FieldControl_112222_S143_R2_001.fastq.gz
1.raw/FieldControl_112422_S141_R1_001.fastq.gz
1.raw/FieldControl_112422_S141_R2_001.fastq.gz
1.raw/FilterControl_112122_S107_R1_001.fastq.gz
1.raw/FilterControl_112122_S107_R2_001.fastq.gz
1.raw/FilterControl_112222_S144_R1_001.fastq.gz
1.raw/FilterControl_112222_S144_R2_001.fastq.gz
1.raw/FilterControl_112422_S142_R1_001.fastq.gz
1.raw/FilterControl_112422_S142_R2_001.fastq.gz
1.raw/Gamo_1_S111_R1_001.fastq.gz
1.raw/Gamo_1_S111_R2_001.fastq.gz
1.raw/Gamo_2_S112_R1_001.fastq.gz
1.raw/Gamo_2_S112_R2_001.fastq.gz
1.raw/Gamo_3_S113_R1_001.fastq.gz
1.raw/Gamo_3_S113_R2_001.fastq.gz
1.raw/Heda_1_S135_R1_001.fastq.gz
1.raw/Heda_1_S135_R2_001.fastq.gz
1.raw/Heda_2_S136_R1_001.fastq.gz
1.raw/Heda_2_S136_R2_001.fastq.gz
1.raw/Heda_3_S137_R1_001.fastq.gz
1.raw/Heda_3_S137_R2_001.fastq.gz
1.raw/Imaizaki_1_S121_R1_001.fastq.gz
1.raw/Imaizaki_1_S121_R2_001.fastq.gz
1.raw/Imaizaki_2_S122_R1_001.fastq.gz
1.raw/Imaizaki_2_S122_R2_001.fastq.gz
1.raw/Imaizaki_3_S123_R1_001.fastq.gz
1.raw/Imaizaki_3_S123_R2_001.fastq.gz
1.raw/Kurasaki_1_S117_R1_001.fastq.gz
1.raw/Kurasaki_1_S117_R2_001.fastq.gz
1.raw/Kurasaki_2_S118_R1_001.fastq.gz
1.raw/Kurasaki_2_S118_R2_001.fastq.gz
1.raw/Kurasaki_3_S119_R1_001.fastq.gz
1.raw/Kurasaki_3_S119_R2_001.fastq.gz
1.raw/Sakibaru_1_S114_R1_001.fastq.gz
1.raw/Sakibaru_1_S114_R2_001.fastq.gz
1.raw/Sakibaru_2_S115_R1_001.fastq.gz
1.raw/Sakibaru_2_S115_R2_001.fastq.gz
1.raw/Sakibaru_3_S116_R1_001.fastq.gz
1.raw/Sakibaru_3_S116_R2_001.fastq.gz
1.raw/Sani_1_S97_R1_001.fastq.gz
1.raw/Sani_1_S97_R2_001.fastq.gz
1.raw/Sani_2_S98_R1_001.fastq.gz
1.raw/Sani_2_S98_R2_001.fastq.gz
1.raw/Sani_3_S99_R1_001.fastq.gz
1.raw/Sani_3_S99_R2_001.fastq.gz
1.raw/Taen_1_S129_R1_001.fastq.gz
1.raw/Taen_1_S129_R2_001.fastq.gz
1.raw/Taen_2_S130_R1_001.fastq.gz
1.raw/Taen_2_S130_R2_001.fastq.gz
1.raw/Taen_3_S131_R1_001.fastq.gz
1.raw/Taen_3_S131_R2_001.fastq.gz
1.raw/Tatsugo_1_S100_R1_001.fastq.gz
1.raw/Tatsugo_1_S100_R2_001.fastq.gz
1.raw/Tatsugo_2_S101_R1_001.fastq.gz
1.raw/Tatsugo_2_S101_R2_001.fastq.gz
1.raw/Tatsugo_3_S102_R1_001.fastq.gz
1.raw/Tatsugo_3_S102_R2_001.fastq.gz
1.raw/Yadon_1_S138_R1_001.fastq.gz
1.raw/Yadon_1_S138_R2_001.fastq.gz
1.raw/Yadon_2_S139_R1_001.fastq.gz
1.raw/Yadon_2_S139_R2_001.fastq.gz
1.raw/Yadon_3_S140_R1_001.fastq.gz
1.raw/Yadon_3_S140_R2_001.fastq.gz
2.trimmed
3.filtered
4.results
5.refdb
5.refdb/coi_blast.ndb
5.refdb/coi_blast.nhr
5.refdb/coi_blast.nin
5.refdb/coi_blast.nog
5.refdb/coi_blast.nos
5.refdb/coi_blast.not
5.refdb/coi_blast.nsq
5.refdb/coi_blast.ntf
5.refdb/coi_blast.nto
5.refdb/coi_clean.txt
5.refdb/names.dmp
5.refdb/nodes.dmp
5.refdb/nucl_gb.accession2taxid.gz
5.refdb/taxdb.btd
5.refdb/taxdb.bti
5.refdb/taxonomy4blast.sqlite3
6.phylotree
japan_eDNA_bootcamp_files.tar.gz
The output shows that we have a single sample metadata file, 90 zipped sequence files, and a formatted reference database. We will provide more details on each file in the following sections.
For now, let’s save and close the current R script. We can save it as navigation.R
in the current working directory bootcamp
.
Now that we have everything set up, we can get started with the bioinformatic processing of our data!