<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>R on R (for ecology)</title><link>https://www.rforecology.com/category/r/</link><description>Recent content in R on R (for ecology)</description><generator>Hugo -- gohugo.io</generator><language>en-us</language><copyright>© HabitU Lab, LLC and R for Ecology {year}</copyright><lastBuildDate>Thu, 23 Mar 2023 08:45:39 +0000</lastBuildDate><atom:link href="https://www.rforecology.com/category/r/index.xml" rel="self" type="application/rss+xml"/><item><title>Top five(ish) sources of ecological data</title><link>https://www.rforecology.com/post/top-five-ish-sources-of-ecological-data/</link><pubDate>Thu, 23 Mar 2023 08:45:39 +0000</pubDate><guid>https://www.rforecology.com/post/top-five-ish-sources-of-ecological-data/</guid><description>&lt;p>As you&amp;rsquo;re learning R, it can be hard to come up with data sets that you can practice with. Though many of us have our own data, those might not always be in the best format to do what we want. Our own data are often messy and require a lot of recoding and reformatting. Wouldn&amp;rsquo;t it be nice if we could download clean data sets that we could work with? Luckily, there are a number of resources out there - you just have to know where to look!&lt;/p>
&lt;p>In this tutorial, I discuss the following data sets:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/00Index.html" target="_blank" rel="noopener">Data sets that come with R&lt;/a>&lt;/li>
&lt;li>The &lt;a href="https://knb.ecoinformatics.org/data" target="_blank" rel="noopener">Knowledge Network for Biocomplexity&lt;/a>&lt;/li>
&lt;li>The &lt;a href="https://portal.edirepository.org/nis/" target="_blank" rel="noopener">Environmental Data Initiative&lt;/a>&lt;/li>
&lt;li>The &lt;a href="https://data.neonscience.org/data-products/explore" target="_blank" rel="noopener">National Ecological Observatory Network&lt;/a>&lt;/li>
&lt;li>The &lt;a href="https://www.gbif.org/" target="_blank" rel="noopener">Global Biodiversity Information Facility&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>I also mention the &lt;a href="https://obis.org/" target="_blank" rel="noopener">Ocean Biodiversity Information System&lt;/a>, &lt;a href="https://search.dataone.org/data" target="_blank" rel="noopener">DataOne&lt;/a>, and the Central Michigan University Library website&amp;rsquo;s &lt;a href="https://libguides.cmich.edu/lifesciencedata/ecological" target="_blank" rel="noopener">list of resources&lt;/a>.&lt;/p>
&lt;img src="https://www.rforecology.com/ecodat_image1.png" alt="Image of several icons showing different habitats on Earth. These icons are surrounding and pointing to the R for Ecology logo. The image says 'Where to find ecological data'" style="width:400px;"/>
&lt;h2 id="1-basic-data-sets-in-r">1) Basic data sets in R&lt;/h2>
&lt;p>One of the first places you can look for practice data sets is within R itself.&lt;/p>
&lt;p>R comes with some standard data sets that you can view if you type &lt;code>data()&lt;/code> into the console. These data sets range from describing the survival of Titanic passengers to describing the locations of earthquakes off the island of Fiji. They are wide-ranging and fun to explore, but most of them are not explicitly ecological.&lt;/p>
&lt;p>Some common ecological data sets that you might use are &lt;code>iris&lt;/code>, &lt;code>PlantGrowth&lt;/code>, and &lt;code>Loblolly&lt;/code>. I find these data sets useful when I&amp;rsquo;m trying to do something quick, like testing how a new function works. Since these data sets are so straightforward, I can usually predict what my expected output should be, and then I can know whether or not the function worked correctly. I also use these data sets as examples for the blog posts that I write - these data sets are great teaching tools because they&amp;rsquo;re fairly simple and easy to understand.&lt;/p>
&lt;p>These data sets are not really intended to be used to conduct your own research; they are primarily used for practice and demonstration purposes.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image2.png" alt="Some data sets that come with R, like &amp;ldquo;ChickWeight&amp;rdquo;, &amp;ldquo;Nile&amp;rdquo;, &amp;ldquo;Orange&amp;rdquo;, and &amp;ldquo;Titanic&amp;rdquo;.">&lt;/p>
&lt;h2 id="2-the-knowledge-network-for-biocomplexityhttpsknbecoinformaticsorgdata">2) &lt;a href="https://knb.ecoinformatics.org/data" target="_blank" rel="noopener">The Knowledge Network for Biocomplexity&lt;/a>&lt;/h2>
&lt;h3 id="introduction-and-how-to">Introduction and how-to&lt;/h3>
&lt;p>The Knowledge Network for Biocomplexity (KNB) is an international repository of ecological data sets that have been uploaded by scientists to facilitate environmental research. These data are also often affiliated with published papers.&lt;/p>
&lt;p>You can search data sets in a variety of ways. On the left side, you can filter the data based on different attributes (e.g., author, year, taxon, geographic location). On the right side, you can look for data sets by location by navigating the handy world map and clicking on the different squares.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image3.png" alt="Image of three panel search screen for the KNB data portal. The left side shows ways you can filter your search. The middle panel shows search results. The right side is an interactive world map with data sets grouped into a geographic grid.">&lt;/p>
&lt;p>When you click on a data set, you&amp;rsquo;re taken to a page where you can download all the associated files. The heading at the top is also the citation for the data package, so it&amp;rsquo;s easy to correctly attribute the work. If you&amp;rsquo;re using a public data set and publishing something (even if just in a blog post or an example), it&amp;rsquo;s a good idea to cite the data set.&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
Published data sets are often identified by their DOI, or &amp;ldquo;digital object identifier&amp;rdquo;. This is just a unique ID assigned to each published entity. If you type in the DOI string after &amp;ldquo;&lt;a href="https://doi.org/%22">https://doi.org/&amp;quot;&lt;/a> (e.g., &lt;a href="https://doi.org/10.5063/F1FN14M4">https://doi.org/10.5063/F1FN14M4&lt;/a> ), you&amp;rsquo;ll get a URL that takes you to the publication.
&lt;/div>
&lt;/div>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image4.png" alt="Image showing dataset page. The heading is a citation for the data set. The page includes download links for individual files. You can also click the &amp;ldquo;download all&amp;rdquo; button to download all files associated with the data package.">&lt;/p>
&lt;p>This page also includes the metadata for the data set to make it easier to navigate and understand the data you&amp;rsquo;re downloading.&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
All good data sets come with &lt;em>metadata&lt;/em>, or data that describes the data set of interest. When you download a data set that was collected by someone else, it&amp;rsquo;s usually hard to tell what each column means, how it was collected, and what its units are. Luckily, metadata helps us figure out how a data set is organized and how we might want to use it. If a data set doesn&amp;rsquo;t come with metadata, then it&amp;rsquo;s very difficult to use and understand the data, rendering it almost useless.
&lt;/div>
&lt;/div>
&lt;p>For example, this data set by Haas-Desmarais et al. (2021) comes with great metadata for each file that&amp;rsquo;s included in the data package. The &amp;ldquo;observations_complete.csv&amp;rdquo; file contains several variables, listed on the side. The authors have defined each variable for us - now we know that the variable &amp;ldquo;actual_time&amp;rdquo; represents the time listed on the camera and does not reflect the actual time in the world. The metadata also tells us the format / unit of the measurement.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image5.png" alt="Image of metadata for the file called &amp;ldquo;observations complete&amp;rdquo;. There is a list of variables on the left side. The screen shows the description for the variable &amp;ldquo;actual_time&amp;rdquo;, which is described as the &amp;ldquo;Time observed by the camera. Note this is not accurate to actual time&amp;rdquo;. The metadata also shows the measurement type as &amp;ldquo;dateTime&amp;rdquo;, and tells us how the data is formatted in the .csv file.">&lt;/p>
&lt;h3 id="takeaways-and-application">Takeaways and application&lt;/h3>
&lt;p>One of the great things about KNB data sets is that there&amp;rsquo;s often a published journal article associated with them (usually linked in the metadata). This allows you to put the data set in the context of the research, and can give you an idea of how you might be able to manipulate the data as you&amp;rsquo;re practicing your R skills. Maybe reading the article will even raise some questions for you that you might want to explore.&lt;/p>
&lt;p>Sometimes the data sets also come with associated R scripts or R Markdown documents that contain the analysis for the paper. This provides a great learning tool where you can see how other scientists conducted their analyses and try to reproduce them.&lt;/p>
&lt;p>You can also download data from the KNB through R, using &lt;a href="https://cran.r-project.org/web/packages/dataone/dataone.pdf" target="_blank" rel="noopener">the package &lt;code>rdataone&lt;/code>&lt;/a>. However, I usually like to download data directly from the site so I can first familiarize myself with the data set.&lt;/p>
&lt;h2 id="3-the-environmental-data-initiativehttpsportaledirepositoryorgnis">3) &lt;a href="https://portal.edirepository.org/nis/" target="_blank" rel="noopener">The Environmental Data Initiative&lt;/a>&lt;/h2>
&lt;h3 id="introduction-and-how-to-1">Introduction and how-to&lt;/h3>
&lt;p>One of my favorite places to download ecological data is the Environmental Data Initiative (EDI) data portal. The EDI archives a lot of environmental data that come from publicly-funded research. The EDI&amp;rsquo;s specialty is that it is the primary location where data from &lt;a href="https://lternet.edu/about/" target="_blank" rel="noopener">Long-Term Ecological Research (LTER) sites&lt;/a> in the United States are archived. This means that the EDI will often have several years' worth of data for a given data set, making this a great resource for examining long-term trends. For example, the EDI hosts data for a project called &amp;ldquo;EcoTrends&amp;rdquo;, which is a large synthesis effort that aggregates ecological data on a yearly or monthly time-scale. The aim of the project is to make long-term ecological data easier to access, analyze, and compare among research sites to evaluate global change. All the EcoTrends data are organized into a common and clean data format (maybe providing good practice for making plots in R?).&lt;/p>
&lt;p>As with the KNB, you can browse data in the EDI portal in a number of ways - you can search by LTER site, or based on keywords that the data creators associated with their data set. Some especially useful methods might be to look for data by discipline, by ecosystem, or by organism.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image6.png" alt="Image of page where you can browse data by keyword or research site. Groupings include organizational units, disciplines, events, measurements, methods, processes, substances, substrates, ecosystems, and organisms.">&lt;/p>
&lt;p>You can also browse data sets by their package identifier, which groups data sets by LTER site or by a specific project (e.g., EcoTrends or the PaleoEcological Observatory Network). Examples of package identifier names include &amp;ldquo;edi&amp;rdquo;, &amp;ldquo;ecotrends&amp;rdquo;, or &amp;ldquo;knb-lter-arc&amp;rdquo;. These codes, in combination with strings of numbers, are used within the EDI to uniquely identify each data set.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image7.png" alt="Page where you can browse data by package identifier. The identifiers are just listed and linked.">&lt;/p>
&lt;p>The EDI also has an advanced search tool, where you can specify several attributes like geographic location, temporal scale, research site, authors, taxon, etc.&lt;/p>
&lt;p>Once you&amp;rsquo;ve decided on a data set, you&amp;rsquo;ll be taken to a page that summarizes the data package you&amp;rsquo;re looking at. This page will provide some basic information like authors, publication date, citation, abstract, and spatial coverage. There will also be a link to download the data, and a link to view the full metadata. Like with the KNB, some data sets come with R scripts that you can run and learn from.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image8.png" alt="Page for an example data set describing hourly and daily climatologies for VCR LTER weather stations from 1989 to 2021.">&lt;/p>
&lt;h3 id="takeaways-and-application-1">Takeaways and application&lt;/h3>
&lt;p>Something &lt;em>really&lt;/em> neat that the EDI provides on each data package page is a code generator that will read in the data for you and format it appropriately. The EDI will generate code for several different coding languages, like Matlab, Python, R, and SAS. We are of course interested in the &amp;ldquo;R&amp;rdquo; and &amp;ldquo;tidyr&amp;rdquo; options.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image9.png" alt="Page where you can see links for code generation. You can choose from MatLab, Python, R, SAS, SPSS, and tidyr. There are arrows pointing to the R and tidyr options, highlighting them.">&lt;/p>
&lt;p>The code under the &amp;ldquo;R&amp;rdquo; option will read in the data as a data frame, while the code under the &amp;ldquo;tidyr&amp;rdquo; option will read in the data as a tibble, using the &lt;code>tidyverse&lt;/code> package (check out our post here [LINK] for a rundown on the differences between data frames and tibbles). You can either download an .R file with the code already written, or you can copy and paste the code into your own file.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image10.png" alt="Page where you can download or copy and paste R code to import the data files. Boxes highlight the ways you can implement the code.">&lt;/p>
&lt;p>Again, EDI data boasts numerous data sets with long-term measurements (some on the scale of decades!), making it really useful for examining long-term trends.&lt;/p>
&lt;br>
&lt;hr>
&lt;p>&lt;strong>Quick note from Luka:&lt;/strong> But what do you do with your data once you have it? If you are still a beginner with R, then I encourage you to check out my full course on The Basics of R (for ecologists). I designed the course to take away the stress of learning R by leading you through a self-paced curriculum that makes R easy and painless. I&amp;rsquo;m confident this course will give you all the essentials you need to feel comfortable working with your own data in just a few weeks. Just click below 👇 to start the course and see what you think!&lt;/p>
&lt;a href="https://coaching.rforecology.com/the-basics-of-r-for-ecologists-enroll?utm_source=blog&amp;utm_medium=bottom_button&amp;utm_campaign=rforecology_blog/" target="_blank">
&lt;img src="https://www.rforecology.com/basics_of_r_thumb.jpg" alt="A landscape made of numbers on the left, and to the right is the R for Ecology logo with 'the basics of R (for ecologists)' written below." style="width:400px; box-shadow: 2px 2px 15px #252a2a; border-radius: 10px;"/>
&lt;/a>
&lt;br>
&lt;p>Or, if you already feel solid with the basics, take your data visualization to the next level with my Introduction to Data Visualization with R (for ecologists) where I teach you everything you need to create professional and publication-quality figures in R. 👇&lt;/p>
&lt;a href="https://coaching.rforecology.com/intro-to-dataviz-in-r-for-ecologists-enroll?utm_source=blog&amp;utm_medium=bottom_button&amp;utm_campaign=rforecology_blog/" target="_blank">
&lt;img src="https://www.rforecology.com/dataviz_thumb.png" alt="A few example data visualizations on the left and to the right there is a landscape made of numbers, with the text 'Intro to Data Visualization with R' written on top. Below that is the R for Ecology logo" style="width:400px; box-shadow: 2px 2px 15px #252a2a; border-radius: 10px;"/>
&lt;/a>
&lt;hr>
&lt;br>
&lt;h2 id="4-national-ecological-observatory-networkhttpsdataneonscienceorgdata-productsexplore">4) &lt;a href="https://data.neonscience.org/data-products/explore" target="_blank" rel="noopener">National Ecological Observatory Network&lt;/a>&lt;/h2>
&lt;h3 id="introduction-and-how-to-2">Introduction and how-to&lt;/h3>
&lt;p>The next resource I&amp;rsquo;m going to discuss is the National Ecological Observatory Network (NEON), which is a network of field sites across the United States at which several types of ecological data are regularly collected in terrestrial and aquatic environments.&lt;/p>
&lt;p>The network is designed so that the U.S. is divided into 20 ecological/climatic domains. Almost every domain has terrestrial and aquatic field sites, which are often placed in close proximity to one another to allow for analysis of linkages across these ecosystems. NEON collects remotely-sensed data, observational data, and data via automatic sensors (e.g., meteorological towers), with the idea that these data will be collected over many, many years. These data are also standardized across NEON sites. As a result, NEON data covers a broad spatial and temporal extent, allowing us to collect and compare certain measurements across the entire U.S. and over long periods of time.&lt;/p>
&lt;p>When you&amp;rsquo;re looking for NEON data, you can search for data in one of two ways.&lt;/p>
&lt;p>The first way is to look for data by site or location through the interactive map on NEON&amp;rsquo;s homepage. This is more of an exploratory approach, where you can zoom in on different parts of the map. The table beneath the map shows you what field sites and plots are visible. If you want to look at a site&amp;rsquo;s data, you can just click &amp;ldquo;Explore Data&amp;rdquo; under the site name, and you&amp;rsquo;ll be taken to NEON&amp;rsquo;s data archive page.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image22.png" alt="Image of data exploration by location. A map of the United States is shown above a table. Icons in the map show locations of field sites and correspond to information in the table. The &amp;ldquo;Explore Data&amp;rdquo; and &amp;ldquo;Site details&amp;rdquo; buttons are circled under the Abby Road site.">&lt;/p>
&lt;p>If you zoom in on a specific research site (I zoomed in on the Smithsonian Environmental Research Center), the map will show you specific plots and locations of towers.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image23.png" alt="Map showing locations of specific research plots at the Smithsonian Environmental Research Center. There is a table below with corresponding information about each plot, including what measurement was taken there, the elevation, the land cover, the plot size, and slope.">&lt;/p>
&lt;p>If you&amp;rsquo;re curious about a specific research site, you can also navigate to the site&amp;rsquo;s information page, which gives a lot of great background about the history of the site, some native fauna and flora, the geology, climate, etc. The image below shows part of the Toolik Field Station NEON page. The right-hand side of the page shows a lot of basic information about the site, like the coordinates, elevation, mean annual temperature, etc. Note that many NEON sites are also LTER sites (e.g., Toolik, Konza Prairie, Jornada).&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image21.png" alt="Image of Toolik Lake Research Natural Area site information page. It shows a paragraph of text giving background about Toolik. The right side of the page has a side panel listing different types of information about the site, like the dominant land cover classes, the dominant wind direction, mean canopy height, mean annual temperature and precipitation, etc.">&lt;/p>
&lt;p>The other way to search for data is to simply go to NEON&amp;rsquo;s &amp;ldquo;Explore Data Products&amp;rdquo; page. You can filter your data search by date, research site, state, domain, and research theme (e.g., atmosphere, biogeochemistry, land cover, organisms/populations/communities). The data sets are grouped by measurement and not by research site. So, for example, you can download a wind speed data set that includes wind speeds from all the research sites that collect that data.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image24.png" alt="Image of the &amp;ldquo;Explore data products&amp;rdquo; page. The left panel allows you to filter your search by several different data set attributes. The first data set listed is called &amp;ldquo;2D wind speed and direction&amp;rdquo;.">&lt;/p>
&lt;p>When you decide on a data set that you want to look at, you can click on the data set name. This will take you to the page for the specific data set, which has loads of information.&lt;/p>
&lt;p>The first part of the page shows information on the data set, including a description of the data, an abstract / reasoning for the data collection, and a citation for when you use the data.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image25.png" alt="Image showing the 2D wind speed and direction data page. The left side is a navigation pane and the right side shows information like a description of the data, an abstract, additional information, and a citation.">&lt;/p>
&lt;p>If you scroll down, you can see information about how the data was collected and processed. NEON provides a brief description about the sampling scheme and instrumentation, as well as detailed documentation about the methods and QA/QC process. They also provide an issue log to address problems that arose during data collection or processing, and they let you know at what sites those issues occurred.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image26.png" alt="Image of Collection and Processing section. This includes a study description, the sampling design, the instruments used, and other documentation related to quality assurance and quality control.">&lt;/p>
&lt;p>The next section shows the spatial and temporal availability of the data. In the table below, each row represents a research site and each column represents a month. The cells are colored in if there is data available at the research site during that month. The cells are grey if there is no data available. You can click the blue &amp;ldquo;Download Data&amp;rdquo; button to begin the data downloading process.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image27.png" alt="Image showing the Availability and Download section. There is a large button that says &amp;ldquo;Download Data&amp;rdquo;. Below that, there is a table where each row represents a research site and each column represents a month. The cells are colored blue if the data is available at that research site during that month. The cells are grey if there is no data.">&lt;/p>
&lt;p>When you&amp;rsquo;re ready to begin downloading data, you can choose what research sites and time periods you want to download data for. Note the estimated file size in the top right corner, as some data sets are very large and can take a while to download. The page provides instructions for how to select sites and your date range. After you make your selection, you will be able to choose whether or not you want to download any associated documentation (i.e., sampling scheme and protocol documents listed in the &amp;ldquo;Collection and Processing&amp;rdquo; section). You can then choose whether you want a basic data package or expanded package, which includes QA/QC metrics. After you agree to NEON Usage and Citation policies, you can then download your data set!&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image29.png" alt="Image showing the data download window. First, you choose the research sites and dates that you want data for. Then you choose whether you want to download associated documentation. Then you choose whether to download a basic data package or expanded data package, with quality assurance and quality control metrics. You need to agree to NEON terms, and then you can download your data!">&lt;/p>
&lt;p>When you unzip the data download, you&amp;rsquo;ll see a bunch of folders. Each folder represents a site-month combination. Within each folder, there are several .csv files. I recommend that you read the .txt file that comes with it, as it describes what each .csv file contains and helps you put together the pieces to understand the data.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image30.png" alt="Image of the unzipped data download. There are several folders. Within each folder, there are several CSV files and one TXT file. There is an arrow pointing to the TXT file that says &amp;ldquo;read this&amp;rdquo;.">&lt;/p>
&lt;p>NEON also comes with a helpful visualization tool on the data set information page. The tool will graph the data for you, so you can get an idea of what it looks like before you download it. You can manipulate pretty much any aspect of the graph. You can add sites to the plot to see how they compare to one another, and you can choose what specific sensor&amp;rsquo;s data you want to display (each site usually has multiple sensors at different locations). You can also adjust the date range that is displayed and the specific variable that is plotted (e.g., minimum, maximum, or mean values). The scroll bar below the X axis allows you to zoom in/focus on a specific time range. The axes ranges, scales, and breaks can also be adjusted. Lastly, you can download the plot as a PNG.&lt;/p>
&lt;p>I encourage you to play around with this - it&amp;rsquo;s such a neat tool! Unfortunately, the visualization tool isn&amp;rsquo;t available for every data set, but it&amp;rsquo;s often available for measurements that are taken by automatic sensors or towers (e.g., air temperature, wind speed, barometric pressure).&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image28.png" alt="Image of the &amp;ldquo;Visualizations&amp;rdquo; section on the data set information page. There is a graph that shows the wind speed in meters per second on the y axis versus time on the x axis. Data are plotted for the sites Abby Road and Dead Lake for the month of February 2022. There are arrows pointing out a scroll bar on the X axis and the download button.">&lt;/p>
&lt;h3 id="takeaways-and-application-2">Takeaways and application&lt;/h3>
&lt;p>NEON has its own R package, &lt;a href="https://www.neonscience.org/resources/learning-hub/tutorials/neondatastackr" target="_blank" rel="noopener">called &lt;code>neonUtilities&lt;/code>&lt;/a>. The package provides functions to help you work with and import NEON data. Something great that NEON provides are &lt;a href="https://www.neonscience.org/resources/learning-hub/tutorials" target="_blank" rel="noopener">R tutorials&lt;/a> for working with NEON data and for general ecological analysis. For example, &lt;a href="https://www.neonscience.org/resources/learning-hub/tutorials/download-explore-neon-data" target="_blank" rel="noopener">here&amp;rsquo;s a tutorial&lt;/a> on how to download and explore NEON data. And &lt;a href="https://www.neonscience.org/resources/learning-hub/tutorials/da-viz-coop-precip-data-r" target="_blank" rel="noopener">here&amp;rsquo;s a guided practice lesson&lt;/a> where you can learn how to search for and visualize precipitation data. Here are &lt;a href="https://www.neonscience.org/resources/learning-hub/tutorials/get-started-neon-data-series-data-tutorials" target="_blank" rel="noopener">NEON&amp;rsquo;s recommendations&lt;/a> for people who are just getting started with NEON data and/or R.&lt;/p>
&lt;p>In short, NEON data are useful for illuminating spatiotemporal trends. NEON is great for comparing several types of data (phenological, biogeochemical, climatological, etc.) across different terrestrial and aquatic environments in the United States. There are also several sites within each ecoclimatic Domain, so you can examine trends across ecological gradients (e.g., elevation).&lt;/p>
&lt;h2 id="5-species-and-biodiversity-data">5) Species and biodiversity data&lt;/h2>
&lt;h3 id="the-global-biodiversity-information-facilityhttpswwwgbiforg">&lt;a href="https://www.gbif.org/" target="_blank" rel="noopener">The Global Biodiversity Information Facility&lt;/a>&lt;/h3>
&lt;h3 id="introduction-and-how-to-3">Introduction and how-to&lt;/h3>
&lt;p>Collecting species occurrence and biodiversity data can be really useful for modeling species distributions and understanding how they might change (e.g., studying impacts of climate change or predicting the spread of invasive species).&lt;/p>
&lt;p>The Global Biodiversity Information Facility (GBIF) is an international data repository that is commonly used to obtain species occurrence data. Let&amp;rsquo;s check it out.&lt;/p>
&lt;p>The main ways to search for data are to search for occurrences, to search for species, or to browse data sets.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image20.png" alt="Image showing the drop down menu called &amp;ldquo;Get data&amp;rdquo;, with arrows pointing to the occurrences, species, and datasets options.">&lt;/p>
&lt;p>When you search for data by &lt;strong>occurrences&lt;/strong>, the easiest method is probably to search for your species of interest. When you type in your species name in the search bar, a drop down menu will appear that shows you the different names or subspecies that your species of interest might be known by. If you want to download all occurrences for your species, then you should include all possible names in your search. In the image below, I searched for &lt;em>Callinectes sapidus&lt;/em>, commonly known as the Atlantic blue crab.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image12.png" alt="Image showing the initial occurrence search screen. A panel on the left lists attributes that you can filter or search by. The panel on the right lists the whole database of species observations.">&lt;/p>
&lt;p>Once you complete your search, you can view occurrences in a table, as a map, or through a photo gallery (usually photos from iNaturalist, an app used for sharing biodiversity/wildlife observations).&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image13.png" alt="Image showing the table and map views for the occurrence data, as well as the photo gallery. An arrow also indicates where the download button is located.">&lt;/p>
&lt;p>There&amp;rsquo;s also a tab that you can click on to download occurrence data, which will look something like this once it&amp;rsquo;s downloaded. Each row of data is one observation of the species, and there are columns that will give you information on taxonomy, the country where the species was observed, the coordinates, and the date, among other data.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image14.png" alt="Image of occurrence data open in Microsoft Excel. The columns that are visible describe taxonomic information and the countries, provinces, and coordinates where blue crabs were observed. There&amp;rsquo;s also a column that indicates whether the species was recorded as present or absent.">&lt;/p>
&lt;p>The &lt;strong>species&lt;/strong> search is slightly different from the &lt;strong>occurrence&lt;/strong> search. As one might think, the &lt;strong>species&lt;/strong> search focuses more on information about the species itself than individual records of occurrence data. The page has a pane on the left that describes the species taxonomy. The pane on the right shows an overview of the species, including the photo gallery, a map of its distribution, its common names, and places where the species is classified as &amp;ldquo;introduced&amp;rdquo; rather than native. This is helpful for broadly learning about your species of interest before you dive into the data.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image15.png" alt="Image of the &amp;ldquo;Species&amp;rdquo; information page. The pane on the left shows taxonomic info at each level of taxonomy. The right side says &amp;ldquo;Callinectes sapidus Rathbun, 1896&amp;rdquo;. You can see that there are 3982 occurrences with images associated with them, and there are 55671 georeferenced occurrences shown on a map.">&lt;/p>
&lt;p>Lastly, you can browse GBIF-associated data sets, which are not organized by species but by network / event / project.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image16.png" alt="The data set search page in GBIF. The first few data sets listed are &amp;ldquo;EOD - eBird Observation Dataset&amp;rdquo;, &amp;ldquo;Artportalen (Swedish Species Observation System)&amp;rdquo;, &amp;ldquo;Observation.org, Nature data from around the world&amp;rdquo;, and &amp;ldquo;iNaturalist Research-grade Observations&amp;rdquo;. You can filter your search by specific data set attributes in the left pane.">&lt;/p>
&lt;p>For example, if I click on the &amp;ldquo;iNaturalist Research-grade Observations&amp;rdquo; data set, I&amp;rsquo;m taken to a page where I can download the whole iNaturalist database of species observations, see the geographic distribution of occurrences, and see the taxonomic breakdown of species listed in the data set.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image17.png" alt="The iNaturalist data set page, titled &amp;ldquo;iNaturalist Research-grade Observations&amp;rdquo;. The page shows the number of occurrences recorded in the data set and a map of locations where species have been observed.">&lt;/p>
&lt;h3 id="takeaways-and-application-3">Takeaways and application&lt;/h3>
&lt;p>GBIF also has a &amp;ldquo;Resources&amp;rdquo; section that can provide inspiration for projects and show you several helpful tools. For example, the &amp;ldquo;Data Use&amp;rdquo; tab lists different publications and projects that use GBIF data, showing you how GBIF data can be used to drive research.&lt;/p>
&lt;p>You can also explore biodiversity and species distribution-related tools in the &amp;ldquo;Tools&amp;rdquo; tab and search for GBIF-related literature in the &amp;ldquo;Literature&amp;rdquo; tab. GBIF also has a &lt;a href="https://data-blog.gbif.org/" target="_blank" rel="noopener">data blog&lt;/a>, where they discuss tips and tricks for how to use GBIF. Very useful!&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image11.png" alt="Image showing the Data use tab in the Resources section of GBIF. The articles shown on the page are titled &amp;ldquo;Global decline in wild bee diversity,&amp;rdquo; &amp;ldquo;Climate change: buzzkill for North American tomato pollinators,&amp;rdquo; and &amp;ldquo;Bryophyte dispersal rates too slow to keep up with changing climates&amp;rdquo;.">&lt;/p>
&lt;p>One last note about GBIF is that it has &lt;a href="https://www.gbif.org/tool/81747/rgbif" target="_blank" rel="noopener">its own R package, called &lt;code>rgbif&lt;/code>&lt;/a>. &lt;code>rgbif&lt;/code> makes it really easy to read GBIF data into R. For more on this, check out &lt;a href="https://www.r-bloggers.com/2021/03/downloading-and-cleaning-gbif-data-with-r/" target="_blank" rel="noopener">this blog post&lt;/a> from R-bloggers, which provides a commented script that walks you through how to import, clean, and map the data. GBIF is pretty commonly used, so there are &lt;a href="https://docs.ropensci.org/rgbif/" target="_blank" rel="noopener">several tutorials&lt;/a> out there on how to use the data.&lt;/p>
&lt;h3 id="the-ocean-biodiversity-information-systemhttpsobisorg">&lt;a href="https://obis.org/" target="_blank" rel="noopener">The Ocean Biodiversity Information System&lt;/a>&lt;/h3>
&lt;p>There&amp;rsquo;s also the Ocean Biodiversity Information System (OBIS), which is like GBIF but for marine species (OBIS actually contributes marine data to GBIF). I&amp;rsquo;m not going to dive too deep into this resource, but OBIS also comes with its own R package, called &lt;code>robis&lt;/code>. Something nice is that OBIS &lt;a href="https://manual.obis.org/dataviz.html" target="_blank" rel="noopener">provides a few examples&lt;/a> of analyses that can be done using OBIS data and using the &lt;code>robis&lt;/code> package. The image below is an example of an R notebook that OBIS created to showcase its data - this can be a great learning tool to follow along with!&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image18.png" alt="Image of an example analysis done with OBIS data. The document is titled &amp;ldquo;Diversity of fish and vulnerable species in Marine World Heritage Sites based on OBIS data."">&lt;/p>
&lt;p>OBIS also has a great visualization tool, called &lt;a href="https://mapper.obis.org/" target="_blank" rel="noopener">&amp;ldquo;mapper&amp;rdquo;&lt;/a>, that allows you to map species distributions on top of one another. Mapper is also the primary way you can search for species records in OBIS. In the image below, I mapped &lt;em>Callinectes sapidus&lt;/em> (blue crab) distributions on top of &lt;em>Zostera marina&lt;/em> (eelgrass) distributions. The green drop down menu beside each species occurrence layer also allows you to view or download occurrence data for that species and modify its appearance on the map.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ecodat_image19.png" alt="Image of map showing blue crab distributions in green and eelgrass distributions in blue. The drop down menu shows options to toggle the point appearances, edit the layer, view or download the data, or delete the layer.">&lt;/p>
&lt;h2 id="looking-for-more">Looking for more?&lt;/h2>
&lt;p>The &lt;a href="https://search.dataone.org/data" target="_blank" rel="noopener">DataOne portal&lt;/a> is a huge archive of environmental data that aggregates data sets from several different repositories and organizations, including many of the resources we listed above (e.g., KNB, EDI, NEON). This is a good portal to look to if you want a very comprehensive search, or if you don&amp;rsquo;t know exactly what you&amp;rsquo;re looking for. The other repositories might be more helpful if you already know exactly what kind of data you want to retrieve.&lt;/p>
&lt;p>I also want to highlight the Central Michigan University Library website, which has a &lt;em>great&lt;/em> &lt;a href="https://libguides.cmich.edu/lifesciencedata/ecological" target="_blank" rel="noopener">list of resources&lt;/a> that you can consult to find data relating to the life sciences (including ecological data!). The website lists a few of the sources we described above, and more. It also provides some good sources of environmental data (e.g., habitat/spatial data and climate data), which could be helpful for modeling. I would definitely check it out, especially if you&amp;rsquo;re searching for public data to use for your own research.&lt;/p>
&lt;p>If you&amp;rsquo;re just looking for practice data, the resources we listed above should provide plenty of data sets for you to use! I recommend that you explore all the different data repositories that I recommended - they&amp;rsquo;re rich with tools and exciting data beyond what I covered in this blog post.&lt;/p>
&lt;p>Do you have any favorite sources of ecological data? Let us know in the comments below! We made a top 5 list so we could dive deep into the details of each one, but it never hurts to learn about more resources. ;)&lt;/p>
&lt;p>I hope this tutorial was helpful. As always, happy coding!&lt;/p>
&lt;br>
&lt;hr>
&lt;p>&lt;strong>Quick note from Luka:&lt;/strong> If you are just starting with R, then I encourage you to check out my full course on The Basics of R (for ecologists). I designed the course to take away the stress of learning R by leading you through a self-paced curriculum that makes R easy and painless. I&amp;rsquo;m confident this course will give you all the essentials you need to feel comfortable working with your own data in just a few weeks. Just click below 👇 to start the course and see what you think!&lt;/p>
&lt;a href="https://coaching.rforecology.com/the-basics-of-r-for-ecologists-enroll?utm_source=blog&amp;utm_medium=bottom_button&amp;utm_campaign=rforecology_blog/" target="_blank">
&lt;img src="https://www.rforecology.com/basics_of_r_thumb.jpg" alt="A landscape made of numbers on the left, and to the right is the R for Ecology logo with 'the basics of R (for ecologists)' written below." style="width:400px; box-shadow: 2px 2px 15px #252a2a; border-radius: 10px;"/>
&lt;/a>
&lt;br>
&lt;p>Or, if you already feel solid with the basics, take your data visualization to the next level with my Introduction to Data Visualization with R (for ecologists) where I teach you everything you need to create professional and publication-quality figures in R. 👇&lt;/p>
&lt;a href="https://coaching.rforecology.com/intro-to-dataviz-in-r-for-ecologists-enroll?utm_source=blog&amp;utm_medium=bottom_button&amp;utm_campaign=rforecology_blog/" target="_blank">
&lt;img src="https://www.rforecology.com/dataviz_thumb.png" alt="A few example data visualizations on the left and to the right there is a landscape made of numbers, with the text 'Intro to Data Visualization with R' written on top. Below that is the R for Ecology logo" style="width:400px; box-shadow: 2px 2px 15px #252a2a; border-radius: 10px;"/>
&lt;/a>
&lt;hr>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;h3 id="citations">Citations&lt;/h3>
&lt;p>Stephanie Haas-Desmarais, Gabriel Benjamen, and Christopher Lortie. 2021. The effect of shrubs and exclosures on animal abundance, Carrizo National Monument. Knowledge Network for Biocomplexity. doi:10.5063/F1FN14M4.&lt;/p>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>How to make a scatterplot in R</title><link>https://www.rforecology.com/post/scatterplots-in-r/</link><pubDate>Mon, 14 Nov 2022 09:30:50 -0400</pubDate><guid>https://www.rforecology.com/post/scatterplots-in-r/</guid><description>&lt;p>Now that you&amp;rsquo;ve learned the very basics of plotting from our earlier tutorial on &lt;a href="https://www.rforecology.com/post/making-your-first-plot-in-r/" target="_blank" rel="noopener">making your very first plot in R&lt;/a>, this blog post will teach you how to customize your scatterplots to make them look better. If you want to take this even a step further, check out my &lt;a href="https://www.rforecology.com/post/how-to-make-a-quality-scatterplot-in-r/" target="_blank" rel="noopener">step-by-step tutorial introduction to publication-quality scatterplots.&lt;/a>&lt;/p>
&lt;img src="https://www.rforecology.com/scatterplots_image0.png" alt="Image of scatterplot with different customizations highlighted such as axis labels, tick marks, and limits, as well as point shape, color, and size." style="width:500px;"/>
&lt;p>You can also watch this blog post as a video by clicking on the image below. &lt;a href="https://www.youtube.com/watch?v=XhLhp_0NcIM" target="_blank" rel="noopener">&lt;img src="https://www.rforecology.com/scatterplots_image1.png" alt="Video thumbnail for how to make a scatterplot">&lt;/a>&lt;/p>
&lt;p>Scatterplots are one of the most common types of plots in ecology, where they show the relationship (or lack thereof) between two continuous variables.&lt;/p>
&lt;p>We&amp;rsquo;re going to create the same scatterplot that we did in the other lesson by loading up the data set &lt;code>PlantGrowth&lt;/code>.&lt;/p>
&lt;p>This data set has 30 rows of data and two columns. The first column, &amp;ldquo;weight&amp;rdquo;, represents the dry biomass of each plant in grams. The second column, &amp;ldquo;group&amp;rdquo;, lists the experimental treatment that each plant was given. We&amp;rsquo;re going to add another column to this data set called &amp;ldquo;water&amp;rdquo;, which will describe the amount of water that each plant has received throughout its life, in liters. If you&amp;rsquo;re following along in RStudio (which you should be! 😄), then you can just copy and paste the code below to add the new column.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Load data&lt;/span>
&lt;span style="color:#268bd2">data&lt;/span>(PlantGrowth)
&lt;span style="color:#586e75"># Add a new column&lt;/span>
PlantGrowth&lt;span style="color:#719e07">$&lt;/span>water &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">3.063&lt;/span>, &lt;span style="color:#2aa198">3.558&lt;/span>, &lt;span style="color:#2aa198">2.233&lt;/span>, &lt;span style="color:#2aa198">3.147&lt;/span>, &lt;span style="color:#2aa198">2.379&lt;/span>, &lt;span style="color:#2aa198">2.106&lt;/span>, &lt;span style="color:#2aa198">2.384&lt;/span>, &lt;span style="color:#2aa198">2.444&lt;/span>, &lt;span style="color:#2aa198">2.492&lt;/span>, &lt;span style="color:#2aa198">3.292&lt;/span>, &lt;span style="color:#2aa198">2.732&lt;/span>, &lt;span style="color:#2aa198">2.153&lt;/span>, &lt;span style="color:#2aa198">2.660&lt;/span>, &lt;span style="color:#2aa198">1.938&lt;/span>, &lt;span style="color:#2aa198">3.583&lt;/span>, &lt;span style="color:#2aa198">1.817&lt;/span>, &lt;span style="color:#2aa198">3.494&lt;/span>, &lt;span style="color:#2aa198">2.559&lt;/span>, &lt;span style="color:#2aa198">1.530&lt;/span>, &lt;span style="color:#2aa198">2.372&lt;/span>, &lt;span style="color:#2aa198">3.176&lt;/span>, &lt;span style="color:#2aa198">2.611&lt;/span>, &lt;span style="color:#2aa198">3.262&lt;/span>, &lt;span style="color:#2aa198">2.947&lt;/span>, &lt;span style="color:#2aa198">2.523&lt;/span>, &lt;span style="color:#2aa198">2.152&lt;/span>, &lt;span style="color:#2aa198">2.771&lt;/span>, &lt;span style="color:#2aa198">2.878&lt;/span>, &lt;span style="color:#2aa198">2.263&lt;/span>, &lt;span style="color:#2aa198">2.518&lt;/span>)
&lt;span style="color:#586e75"># View first few rows of data&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(PlantGrowth)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## weight group water
## 1 4.17 ctrl 3.063
## 2 5.58 ctrl 3.558
## 3 5.18 ctrl 2.233
## 4 6.11 ctrl 3.147
## 5 4.50 ctrl 2.379
## 6 4.61 ctrl 2.106
&lt;/code>&lt;/pre>&lt;p>Awesome. Now, using the &lt;code>plot()&lt;/code> function, let&amp;rsquo;s create a plot of plant weight versus the amount of water that the plant received.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Plot plant weight versus water received&lt;/span>
&lt;span style="color:#268bd2">plot&lt;/span>(weight &lt;span style="color:#719e07">~&lt;/span> water, data &lt;span style="color:#719e07">=&lt;/span> PlantGrowth)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/scatterplots-in-r/index_files/figure-html/unnamed-chunk-2-1.png" width="672" />&lt;/p>
&lt;p>Now we have a basic scatterplot, but it doesn&amp;rsquo;t look all that great aesthetically. To help with that, I&amp;rsquo;m going to show you some different customizations that allow you to modify several of the plot elements.&lt;/p>
&lt;p>Let&amp;rsquo;s start with the axis labels. We can modify the &lt;code>xlab&lt;/code> and &lt;code>ylab&lt;/code> arguments within the &lt;code>plot()&lt;/code> function. &lt;code>xlab&lt;/code> refers to the label on the X axis, while &lt;code>ylab&lt;/code> refers to the label on the Y axis. Notice that I also pressed the &amp;ldquo;Enter&amp;rdquo; or &amp;ldquo;Return&amp;rdquo; key after each comma in the &lt;code>plot()&lt;/code> function. This just keeps the code cleaner and more readable, but you could have also written it all in one long line.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Edit the axis labels of the plot&lt;/span>
&lt;span style="color:#268bd2">plot&lt;/span>(weight &lt;span style="color:#719e07">~&lt;/span> water,
data &lt;span style="color:#719e07">=&lt;/span> PlantGrowth,
xlab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Total Water (L)&amp;#34;&lt;/span>,
ylab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Dried Biomass Weight (g)&amp;#34;&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/scatterplots-in-r/index_files/figure-html/unnamed-chunk-3-1.png" width="672" />&lt;/p>
&lt;p>Great! Our axis labels look good. We can also make the graph a little more spacious by editing the limits of the axes. We can do this using the &lt;code>xlim&lt;/code> and &lt;code>ylim&lt;/code> arguments. These arguments accept vectors of the form &lt;code>c(lower_limit, upper_limit)&lt;/code>. So if we wanted the X axis to go from 1 to 5, we would say &lt;code>xlim = c(1, 5)&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Edit the axis limits of the plot&lt;/span>
&lt;span style="color:#268bd2">plot&lt;/span>(weight &lt;span style="color:#719e07">~&lt;/span> water,
data &lt;span style="color:#719e07">=&lt;/span> PlantGrowth,
xlab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Total Water (L)&amp;#34;&lt;/span>,
ylab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Dried Biomass Weight (g)&amp;#34;&lt;/span>,
xlim &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1.25&lt;/span>, &lt;span style="color:#2aa198">3.75&lt;/span>),
ylim &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">3.25&lt;/span>, &lt;span style="color:#2aa198">6.75&lt;/span>))
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/scatterplots-in-r/index_files/figure-html/unnamed-chunk-4-1.png" width="672" />&lt;/p>
&lt;p>Nice, our plot looks a little less crowded. The last aspect of the axes that you might want to change are the axis tick marks. We can do this using the &lt;code>xaxp&lt;/code> and &lt;code>yaxp&lt;/code> arguments. These arguments accept vectors in the form &lt;code>c(lower_limit, upper_limit, number_of_intervals)&lt;/code>. So if we want the X axis tick marks to go from 1.25 to 3.75 with 5 intervals in between, we would write &lt;code>xaxp = c(1.25, 3.75, 5)&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Edit the axis tick marks of the plot&lt;/span>
&lt;span style="color:#268bd2">plot&lt;/span>(weight &lt;span style="color:#719e07">~&lt;/span> water,
data &lt;span style="color:#719e07">=&lt;/span> PlantGrowth,
xlab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Total Water (L)&amp;#34;&lt;/span>,
ylab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Dried Biomass Weight (g)&amp;#34;&lt;/span>,
xlim &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1.25&lt;/span>, &lt;span style="color:#2aa198">3.75&lt;/span>),
ylim &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">3.25&lt;/span>, &lt;span style="color:#2aa198">6.75&lt;/span>),
xaxp &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1.25&lt;/span>, &lt;span style="color:#2aa198">3.75&lt;/span>, &lt;span style="color:#2aa198">5&lt;/span>),
yaxp &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">3.5&lt;/span>, &lt;span style="color:#2aa198">6.5&lt;/span>, &lt;span style="color:#2aa198">3&lt;/span>))
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/scatterplots-in-r/index_files/figure-html/unnamed-chunk-5-1.png" width="672" />&lt;/p>
&lt;p>Now let&amp;rsquo;s change the appearance of the points in the plot. The open circles that we currently have can be nice, especially if many of the points overlap. However, normally we would probably want to have simple, filled-in circles.&lt;/p>
&lt;p>We can change the shape of the points using the &lt;code>pch&lt;/code> argument. 16 happens to be the value that corresponds to filled-in points, but you can play around with other numbers to see the types of symbols that are available.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Edit the point shape&lt;/span>
&lt;span style="color:#268bd2">plot&lt;/span>(weight &lt;span style="color:#719e07">~&lt;/span> water,
data &lt;span style="color:#719e07">=&lt;/span> PlantGrowth,
xlab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Total Water (L)&amp;#34;&lt;/span>,
ylab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Dried Biomass Weight (g)&amp;#34;&lt;/span>,
xlim &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1.25&lt;/span>, &lt;span style="color:#2aa198">3.75&lt;/span>),
ylim &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">3.25&lt;/span>, &lt;span style="color:#2aa198">6.75&lt;/span>),
xaxp &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1.25&lt;/span>, &lt;span style="color:#2aa198">3.75&lt;/span>, &lt;span style="color:#2aa198">5&lt;/span>),
yaxp &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">3.5&lt;/span>, &lt;span style="color:#2aa198">6.5&lt;/span>, &lt;span style="color:#2aa198">3&lt;/span>),
pch &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">16&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/scatterplots-in-r/index_files/figure-html/unnamed-chunk-6-1.png" width="672" />&lt;/p>
&lt;p>You can also change the color of the points using the &lt;code>col&lt;/code> argument, where you can just type the name of a color in quotes.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Edit the point shape&lt;/span>
&lt;span style="color:#268bd2">plot&lt;/span>(weight &lt;span style="color:#719e07">~&lt;/span> water,
data &lt;span style="color:#719e07">=&lt;/span> PlantGrowth,
xlab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Total Water (L)&amp;#34;&lt;/span>,
ylab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Dried Biomass Weight (g)&amp;#34;&lt;/span>,
xlim &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1.25&lt;/span>, &lt;span style="color:#2aa198">3.75&lt;/span>),
ylim &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">3.25&lt;/span>, &lt;span style="color:#2aa198">6.75&lt;/span>),
xaxp &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1.25&lt;/span>, &lt;span style="color:#2aa198">3.75&lt;/span>, &lt;span style="color:#2aa198">5&lt;/span>),
yaxp &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">3.5&lt;/span>, &lt;span style="color:#2aa198">6.5&lt;/span>, &lt;span style="color:#2aa198">3&lt;/span>),
pch &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">16&lt;/span>,
col &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;blue&amp;#34;&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/scatterplots-in-r/index_files/figure-html/unnamed-chunk-7-1.png" width="672" />&lt;/p>
&lt;p>It can be fun to use different colors, but the best practice is to keep your figures in grayscale unless the colors in your figure specifically signify something. In the case of our figure, there isn&amp;rsquo;t really a reason to change the color of the points except for the purposes of demonstration. So let&amp;rsquo;s change the color back to black.&lt;/p>
&lt;p>You can also change point size using the argument &lt;code>cex&lt;/code>. The default for &lt;code>cex&lt;/code> is 1, which represents 100%. So if we change the &lt;code>cex&lt;/code> argument to 1.5, the points will be 50% larger.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Edit the point shape&lt;/span>
&lt;span style="color:#268bd2">plot&lt;/span>(weight &lt;span style="color:#719e07">~&lt;/span> water,
data &lt;span style="color:#719e07">=&lt;/span> PlantGrowth,
xlab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Total Water (L)&amp;#34;&lt;/span>,
ylab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Dried Biomass Weight (g)&amp;#34;&lt;/span>,
xlim &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1.25&lt;/span>, &lt;span style="color:#2aa198">3.75&lt;/span>),
ylim &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">3.25&lt;/span>, &lt;span style="color:#2aa198">6.75&lt;/span>),
xaxp &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1.25&lt;/span>, &lt;span style="color:#2aa198">3.75&lt;/span>, &lt;span style="color:#2aa198">5&lt;/span>),
yaxp &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">3.5&lt;/span>, &lt;span style="color:#2aa198">6.5&lt;/span>, &lt;span style="color:#2aa198">3&lt;/span>),
pch &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">16&lt;/span>,
col &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;black&amp;#34;&lt;/span>,
cex &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">1.5&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/scatterplots-in-r/index_files/figure-html/unnamed-chunk-8-1.png" width="672" />&lt;/p>
&lt;p>And now we have a nicer-looking scatterplot. The axis labels are clearer, the points have been filled in, and our plot looks less crowded. Now you know how to customize the axis labels, the axis tick marks and limits, and the point shape, color, and size within your scatterplot.&lt;/p>
&lt;p>There is of course a lot more that you can do, but this tutorial is aimed at giving you the most important attributes that you can modify in the base &lt;code>plot()&lt;/code> function. I used only these for the longest time without needing to branch out to ggplot or other more advanced techniques. But be sure to check out my other tutorial that takes this just a bit further to show you &lt;a href="https://www.rforecology.com/post/how-to-make-a-quality-scatterplot-in-r/" target="_blank" rel="noopener">how to make publication-quality scatterplots&lt;/a>. Happy visualizing!&lt;/p>
&lt;br>
&lt;hr>
&lt;center>Found this tutorial helpful? Check out my full course Introduction to Data Visualization with R (for ecologists) here:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://coaching.rforecology.com/intro-to-dataviz-in-r-for-ecologists-enroll?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start Intro to Data Viz with R now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>Free workshop on how to learn R</title><link>https://www.rforecology.com/post/free-workshop-on-how-to-learn-r/</link><pubDate>Thu, 12 May 2022 09:30:50 -0400</pubDate><guid>https://www.rforecology.com/post/free-workshop-on-how-to-learn-r/</guid><description>&lt;p>Hello everyone! I am psyched to announce the launch of my free workshop about how to learn R. It&amp;rsquo;s been a long time in the making, but it is finally here. The workshop is called &lt;strong>&lt;a href="https://coaching.rforecology.com/free-workshop" target="_blank" rel="noopener">The Myth of the R Learning Curve (or how not to go crazy when learning R)&lt;/a>&lt;/strong>.&lt;/p>
&lt;p>In the workshop, I go over my own personal story and how I came to love and learn R. I also talk about why R is such a powerful tool that brings you to the cutting edge of science. Some of the other key topics I cover in the workshop include:&lt;/p>
&lt;ul>
&lt;li>The Myth of the R learning curve: what it is, why it&amp;rsquo;s there, and how to quickly get beyond it&lt;/li>
&lt;li>Why the order in which you learn R is critical for making it easy to learn&lt;/li>
&lt;li>How to apply Pareto&amp;rsquo;s principle (the 80:20 rule) when learning R&lt;/li>
&lt;li>The counter-intuitive secret about statistics and R&lt;/li>
&lt;li>How you can keep practicing R even if you don&amp;rsquo;t have any data yet (and have fun in the process!)&lt;/li>
&lt;/ul>
&lt;p>I feel strongly about the fact that it doesn’t need to take years and expensive university courses to feel comfortable working with R. I have taught hundreds of students and I am excited to share what I&amp;rsquo;ve learned along the way. Don’t let the R learning curve stand in the way of doing good science. Learning R can be faster, more fun, and easier than you thought.&lt;/p>
&lt;p>If you watch it to the very end, I&amp;rsquo;ll be sharing a &lt;strong>cool bonus surprise so that you never have another reason to say you don&amp;rsquo;t know R.&lt;/strong>&lt;/p>
&lt;p>&lt;strong>To watch my free workshop, just &lt;a href="https://coaching.rforecology.com/free-workshop" target="_blank" rel="noopener">click here&lt;/a> to enter your email and access the workshop.&lt;/strong> 👇&lt;/p>
&lt;a href = "https://coaching.rforecology.com/free-workshop">
&lt;img src="https://www.rforecology.com/free_workshop_thumb.png/" alt="FREE Workshop on how to learn R, written at the top over a screenshot of R Studio and me speaking on a microphone in the front. Also, there is the R for Ecology logo of R with a plant growing through it." style="width:400px;"/>
&lt;a/>
&lt;p>I look forward to seeing you there!&lt;/p>
&lt;p>~ Luka&lt;/p></description></item><item><title>The basics of prototyping and exporting your plots in R</title><link>https://www.rforecology.com/post/exporting-plots-in-r/</link><pubDate>Thu, 05 May 2022 09:30:50 -0400</pubDate><guid>https://www.rforecology.com/post/exporting-plots-in-r/</guid><description>&lt;p>It&amp;rsquo;s super rewarding when you finally figure out how to plot and visualize your data. But to show off your plot to the rest of the world, you need to first be able to save and export it from your R Studio workspace.&lt;/p>
&lt;p>In this tutorial, I&amp;rsquo;m going to show you how to prototype, save, and export your plots from R. (Note, I use the term &amp;lsquo;plot&amp;rsquo; and &amp;lsquo;figure&amp;rsquo; interchangeably to mean the same thing: a data visualization!)&lt;/p>
&lt;img src="https://www.rforecology.com/savingplots_image0.png" alt="Image showing a figure in R turning into a PDF figure" style="width:500px;"/>
&lt;p>For starters, if you need a tutorial on how to make plots in R, you can check out &lt;a href="https://www.youtube.com/watch?v=EL05E_T5ajs" target="_blank" rel="noopener">this video&lt;/a> on how to make your first plot. You can also enroll in our full online course on data visualization, titled &lt;a href="https://courses.rforecology.com/p/intro-to-dataviz-for-ecologists-prereg" target="_blank" rel="noopener">&amp;ldquo;Intro to data visualization in R (for ecologists)&amp;rdquo;.&lt;/a>&lt;/p>
&lt;p>You can also watch this blog post as a video if you want to follow along with one of the lessons from my full course: &lt;a href="https://www.youtube.com/watch?v=udmNwR7Eokg" target="_blank" rel="noopener">&lt;img src="https://www.rforecology.com/savingplots_image1.png" alt="Video thumbnail for a tutorial on saving and exporting plots from R.">&lt;/a>&lt;/p>
&lt;p>Let&amp;rsquo;s start out by loading up some data:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Load up our data&lt;/span>
&lt;span style="color:#268bd2">data&lt;/span>(PlantGrowth)
&lt;span style="color:#586e75"># Look at our data&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(PlantGrowth)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## weight group
## 1 4.17 ctrl
## 2 5.58 ctrl
## 3 5.18 ctrl
## 4 6.11 ctrl
## 5 4.50 ctrl
## 6 4.61 ctrl
&lt;/code>&lt;/pre>&lt;p>The PlantGrowth data are pre-built into R, so you can load the dataset with just the &lt;code>data()&lt;/code> function as I&amp;rsquo;ve done above. These data describe the weights of plants that were placed under different experimental treatments. Let&amp;rsquo;s look at those treatments:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Look at the treatment levels&lt;/span>
&lt;span style="color:#268bd2">levels&lt;/span>(PlantGrowth&lt;span style="color:#719e07">$&lt;/span>group)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;ctrl&amp;quot; &amp;quot;trt1&amp;quot; &amp;quot;trt2&amp;quot;
&lt;/code>&lt;/pre>&lt;p>Hmmm&amp;hellip; &amp;ldquo;ctrl&amp;rdquo;, &amp;ldquo;trt1&amp;rdquo;, and &amp;ldquo;trt2&amp;rdquo; are not very good descriptions of the treatment levels. We need to describe them better if we want to put these treatment levels in a useful plot. Let&amp;rsquo;s rename them and we can say that the different treatments reflect different light levels.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Change the names of the treatment levels &lt;/span>
&lt;span style="color:#268bd2">levels&lt;/span>(PlantGrowth&lt;span style="color:#719e07">$&lt;/span>group) &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;Control&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;High-Light&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Low-Light&amp;#34;&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Now the data look a little better.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View the levels again&lt;/span>
&lt;span style="color:#268bd2">levels&lt;/span>(PlantGrowth&lt;span style="color:#719e07">$&lt;/span>group)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;Control&amp;quot; &amp;quot;High-Light&amp;quot; &amp;quot;Low-Light&amp;quot;
&lt;/code>&lt;/pre>&lt;p>Let&amp;rsquo;s run this code to create a boxplot (see how I came up with this code in &lt;a href="https://www.youtube.com/watch?v=q03cJVMNpsU&amp;amp;t=1s" target="_blank" rel="noopener">this video&lt;/a>).&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create the plot&lt;/span>
&lt;span style="color:#268bd2">plot&lt;/span>(weight &lt;span style="color:#719e07">~&lt;/span> group, data &lt;span style="color:#719e07">=&lt;/span> PlantGrowth,
xlab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Sunlight Treatment&amp;#34;&lt;/span>,
ylab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Dried Biomass Weight (g)&amp;#34;&lt;/span>,
col &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">4&lt;/span>,
boxlty &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">0&lt;/span>,
whisklty &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">1&lt;/span>,
whisklwd &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">1.5&lt;/span>,
staplelwd &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">1.5&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/exporting-plots-in-r/index_files/figure-html/unnamed-chunk-5-1.png" width="672" />&lt;/p>
&lt;p>When we just create a plot like this in R Studio, the visual proportions of the plot aren&amp;rsquo;t set automatically. In other words, your figure is plotted in, and conforms to, the &lt;code>Viewer&lt;/code> tab in R Studio. If you drag the size of that viewer, you can make the plot have whatever proportions you want. As a result, it can be hard to come up with figures that have consistent and correct sizing and proportions, &lt;em>especially&lt;/em> if you&amp;rsquo;re making several figures that need to have consistent sizing.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/Moving%20R%20window%20around.gif" alt="Gif showing someone moving the R plot viewing window around to change its proportions.">&lt;/p>
&lt;p>So the general workflow that I use for creating figures is to first create something that looks more or less good in the viewer window. Then I begin prototyping the different sizing and aspect ratio of the figure by writing out the width and height right in the code until I find something that I like.&lt;/p>
&lt;p>You can do this using the &lt;code>quartz()&lt;/code> function on a Mac. If you run &lt;code>quartz()&lt;/code>, it will open up a blank graphics device window like this one:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Open graphics device window&lt;/span>
&lt;span style="color:#268bd2">quartz&lt;/span>()
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/savingplots_image2.png" alt="Image showing a blank window opened on top of RStudio.">&lt;/p>
&lt;p>And then if we run our plot code after running &lt;code>quartz()&lt;/code>, the plot will show up in the pre-sized window:&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/savingplots_image3.png" alt="Image showing the plot in the graphics device window">&lt;/p>
&lt;p>&lt;strong>(Note that for Windows, you can use the &lt;code>windows()&lt;/code> function, and for Linux, you can use the &lt;code>x11()&lt;/code> function. I&amp;rsquo;m going to show how to do it on Mac since that&amp;rsquo;s the computer I have, but it should be similar on Windows or Linux computers as long as you change the function.)&lt;/strong>&lt;/p>
&lt;p>We can set a standard height and width for the new window, where h is height and w is width:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Set a standard plot size&lt;/span>
&lt;span style="color:#268bd2">quartz&lt;/span>(h &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">4&lt;/span>, w &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">4&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/savingplots_image4.png" alt="Image showing two quartz windows open of different sizes">&lt;/p>
&lt;p>If you don&amp;rsquo;t specify a height or width, the default size for &lt;code>quartz()&lt;/code> is height = 7 and width = 7, measured in inches; as you can see in the image above, our h = 4 and w = 4 Quartz window is much smaller than the default behind it.&lt;/p>
&lt;p>But notice that the font sizes and other graphical elements such as line widths or point sizes remain the same size! This is why it&amp;rsquo;s important to prototype the sizing. For example, it&amp;rsquo;s quite clear that the smaller 4x4 figure looks a bit better, aesthetically speaking, than the 7x7.&lt;/p>
&lt;p>My biggest pet peeve is the common tendency of saving figures with a size that is way too big relative to the font and point size. This creates at best a very unappealing visual, and at worst a figure that is very hard to read or interpret in the first place.&lt;/p>
&lt;p>Ok. I&amp;rsquo;ll stop that rant&amp;hellip; :😆:&lt;/p>
&lt;p>When you&amp;rsquo;re assigning values to height and width, you should generally use values ranging from 1 to 10. But also watch out if you make the figure too small, because you might receive an error about the figure margins being too large to fit the figure itself.&lt;/p>
&lt;p>For example, if I set the window to be 1 inch by 1 inch and then try to run the plot code, the console says &lt;code>Error in plot.new() : figure margins too large&lt;/code>&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/savingplots_image5.png" alt="Image showing a tiny blank quartz window and an error that says &amp;ldquo;figure margins too large&amp;rdquo;">&lt;/p>
&lt;p>You can keep playing around with the window size until you find something that works for you. Since 1&amp;quot;x 1&amp;quot; was too small, let&amp;rsquo;s set our plot size at height = 4.5, width = 4.5.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Set plot size&lt;/span>
&lt;span style="color:#268bd2">quartz&lt;/span>(h &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">4.5&lt;/span>, w &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">4.5&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Now we have a plot that we&amp;rsquo;re happy with! To save the figure from the Quartz window, go to the &amp;ldquo;RStudio&amp;rdquo; menu tab and click &amp;ldquo;Save&amp;rdquo;. On a Windows computer, you might go to &amp;ldquo;File&amp;rdquo; or a similar menu tab.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/savingplots_image6.png" alt="Image showing an appropriately-sized figure with the cursor over the RStudio menu tab and the Save button">&lt;/p>
&lt;p>This will prompt you to save your figure as a .pdf file. PDFs are actually one of the best file formats for figures because they have a virtually infinite resolution (try to keep zooming in on a figure you save as a PDF and you&amp;rsquo;ll see what I mean!). This also means that the file size ends up being pretty large, so you can just convert it to a .jpg or .png whenever you need a smaller file type.&lt;/p>
&lt;p>You also have the option to export your figure from the R figure viewer pane, either as an image or as a PDF. When you select either option, a window will pop up that will allow you to choose your figure height and width. The disadvantages of doing this versus using the Quartz window is that you aren&amp;rsquo;t really able to visualize what your sizing might look like, and if you want to share reproducible code with someone, they won&amp;rsquo;t know what size to save the figure as.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/savingplots_image7.png" alt="Image with green circle around the cursor, hovering over &amp;ldquo;Export&amp;rdquo; and &amp;ldquo;Save as PDF&amp;rdquo;">&lt;/p>
&lt;p>Now if we go to our files and click on the plot that we saved, we can see it in PDF form.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/savingplots_image8.png" alt="Image showing the same R plot but in a PDF viewer">&lt;/p>
&lt;p>So those are the basics of prototyping and saving your plots in R. These are the tools I&amp;rsquo;ve always used for the majority of all my visualization work in R. That&amp;rsquo;s not to say that there aren&amp;rsquo;t other (even fancier) ways to save plots directly from the R code. Remember that from the Quartz window you still have to go to File &amp;gt; Save in order to export the plot. However, I find that this simple system using Quartz windows is the perfect intermediary between hard coding everything and total point-and-click exporting. It also lets you easily prototype your figures as you create them.&lt;/p>
&lt;p>And that&amp;rsquo;s it! Remember that if you want to learn more about visualization, be sure to check out my complete course &lt;a href="https://coaching.rforecology.com/intro-to-dataviz-in-r-for-ecologists-enroll" target="_blank" rel="noopener">&amp;ldquo;Intro to data visualization in R (for ecologists)&amp;rdquo;.&lt;/a>&lt;/p>
&lt;p>If you liked this and want learn even more more, you can check out the full course on the &lt;strong>complete basics of R for ecology &lt;a href="https://coaching.rforecology.com/the-basics-of-r-for-ecologists-enroll" target="_blank" rel="noopener">right here&lt;/a>&lt;/strong>.&lt;/p>
&lt;br>
&lt;hr>
&lt;center>Check out my full course data visualization with R (for ecologists) here:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://coaching.rforecology.com/intro-to-dataviz-in-r-for-ecologists-enroll?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start visualizing your data now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>How to reshape your data in R for analysis</title><link>https://www.rforecology.com/post/reshaping-data-in-r/</link><pubDate>Thu, 28 Apr 2022 09:30:50 -0400</pubDate><guid>https://www.rforecology.com/post/reshaping-data-in-r/</guid><description>&lt;script src="https://rforecology.activehosted.com/f/embed.php?id=21" type="text/javascript" charset="utf-8">&lt;/script>
&lt;p>One of the toughest parts of data analysis is preparing your data to be analyzed. We often have to deal with problems like &lt;code>NAs&lt;/code>, typos, and data that are formatted incorrectly. In this blog post, I&amp;rsquo;m specifically going to help you with that last one — I&amp;rsquo;m going to show you how to reshape data so that it&amp;rsquo;s in the correct form for data analysis in R.&lt;/p>
&lt;img src="https://www.rforecology.com/reshape_image1.png" alt="Image saying 'How to reshape data' showing a table in wide format turning into a long format table" style="width:400px;"/>
&lt;h2 id="wide-vs-long-format-data">Wide vs. Long format data&lt;/h2>
&lt;p>Data often comes in two formats: wide or long.&lt;/p>
&lt;p>&lt;strong>Wide&lt;/strong> format data looks something like this:&lt;/p>
&lt;p>This table describes the average diameter at breast height (DBH) in centimeters for three different tree species (red maple, white oak, and loblolly pine) at four sites labeled A, B, C, and D.&lt;/p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://www.rforecology.com/reshape_image2.png" alt="A table with three tree species — red maple, white oak, and loblolly pine -- and their diameters at breast height at four different sites called A, B, C, and D. The table is formatted so that each site is one column and each tree species is one row." loading="lazy" data-zoomable width="80% !important" />&lt;/div>
&lt;/div>&lt;/figure>
&lt;p>This is a very common way to format data, where the first column (tree species) contains unique values. Each tree species appears only once, and their DBH measurements are sorted in the table by site. So in wide format data, each row represents a tree species that we&amp;rsquo;re observing. This format is called &amp;ldquo;wide format&amp;rdquo; because the table becomes wider — if you want to add data, you need to add more columns to the table.&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
Note that it is common to start with your data in wide format since this format often makes it easy to enter data in the field. So when you transcribe your data sheets to your computer, it&amp;rsquo;s usually easiest to follow the same format — especially if you have hundreds of pages to upload.
&lt;/div>
&lt;/div>
&lt;p>&lt;strong>Long&lt;/strong> format data, in contrast, looks like this:&lt;/p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://www.rforecology.com/reshape_image3.png" alt="A table with the same tree species, measurements, and sites, but now formatted so that each measurement for DBH represents unique rows." loading="lazy" data-zoomable width="60% !important" />&lt;/div>
&lt;/div>&lt;/figure>
&lt;p>Here, each tree species is repeated several times in the first column, and the site names all become values in a new column, called &amp;ldquo;Site&amp;rdquo;. Now, each row contains a tree species, the site it was found at, and its DBH measurement. In long format data, each row should represent one observation (each DBH measurement = one observation). This type of data is called &amp;ldquo;long format&amp;rdquo; because the table grows longer when you add more data.&lt;/p>
&lt;h3 id="which-format-is-more-useful">Which format is more useful?&lt;/h3>
&lt;p>The advantage of wide format data is that it clearly and concisely summarizes DBH measurements for us. Wide format data is what we usually use to display data as tables for presentations or papers. Long format data, by comparison, is easier to use for data analysis or visualization in R.&lt;/p>
&lt;p>Long format data is also called &amp;ldquo;tidy data&amp;rdquo;, as termed by Hadley Wickham, the lead developer of the tidyverse packages. He &lt;a href="https://vita.had.co.nz/papers/tidy-data.pdf" target="_blank" rel="noopener">describes &amp;ldquo;tidy data&amp;rdquo;&lt;/a> as having the following attributes:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Each column is its own variable&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Each row is one observation&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Each cell is one value&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>Our long format data fulfills all of those requirements. Each column in our long format data represents a variable (tree species, site, and DBH). Each row represents one observation: DBH. And each cell contains one value.&lt;/p>
&lt;p>Let me provide a concrete example to show why tidy data is useful. What if we go to other sites and not all the species are present there? Let&amp;rsquo;s say &lt;em>Quercus alba&lt;/em> is present at Site E but not Site F, while &lt;em>Acer rubrum&lt;/em> and &lt;em>Pinus taeda&lt;/em> are present at Site F but not Site E. If we add this data to our wide format table, it would look like this:&lt;/p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://www.rforecology.com/reshape_image4.png" alt="The same wide format table, but now two columns have been added for Sites E and F. There are NAs for the sites where certain tree species were not present." loading="lazy" data-zoomable width="80% !important" />&lt;/div>
&lt;/div>&lt;/figure>
&lt;p>There are &lt;code>NAs&lt;/code> in the &amp;ldquo;Site E&amp;rdquo; column for &lt;em>Acer rubrum&lt;/em> and &lt;em>Pinus taeda&lt;/em> because they weren&amp;rsquo;t present at that site, so it wouldn&amp;rsquo;t make sense for them to have a DBH measurement there. Likewise, there is an &lt;code>NA&lt;/code> in the &amp;ldquo;Site F&amp;rdquo; column for &lt;em>Quercus alba&lt;/em> because it wasn&amp;rsquo;t present at that site. By adding data to our wide format table, we also added in missing values that we&amp;rsquo;ll have to deal with.&lt;/p>
&lt;p>If we add this data to our long format table instead, we don&amp;rsquo;t have any &lt;code>NAs&lt;/code> because each row in our table represents a DBH measurement. The places where there were &lt;code>NAs&lt;/code> in the wide format table just don&amp;rsquo;t have a row in our long format table. Though this is an advantage in some cases, the fact that the missing observations are not there at all can also make it easy to overlook missing data in long format. The nice thing about converting from wide to long format in R, though, is that those rows with NA values can be preserved if you need them.&lt;/p>
&lt;p>Long format data also clearly shows the categories that we might want to analyze the data by — we can see that we have columns for tree species and for site. This makes it easy for us to average DBH for a specific species, or summarize DBH for a specific site. Organizing data in this way makes it much easier for us to add and analyze data.&lt;/p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://www.rforecology.com/reshape_image5.png" alt="The same long format table, but this time with rows added for Sites E and F. This table has no NA values." loading="lazy" data-zoomable width="60% !important" />&lt;/div>
&lt;/div>&lt;/figure>
&lt;p>Unfortunately, data often starts in wide format and it can be tedious to manually change it to long format data. Good thing there are functions in R to help us out! Let&amp;rsquo;s see how they work.&lt;/p>
&lt;h2 id="how-to-reshape-your-data-using-tidyr">How to reshape your data using tidyr&lt;/h2>
&lt;p>First, let&amp;rsquo;s upload the &lt;code>tidyverse&lt;/code> package, which contains the &lt;code>tidyr&lt;/code> package within it. You might need to run &lt;code>install.packages(&amp;quot;tidyverse&amp;quot;)&lt;/code> if you don&amp;rsquo;t have this package installed yet.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">library&lt;/span>(tidyverse)
&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="preparing-our-data">Preparing our data&lt;/h3>
&lt;p>Let&amp;rsquo;s also load a data set. I downloaded data describing forest area as a percent of total land area for each country of the world, from the &lt;a href="https://data.worldbank.org/" target="_blank" rel="noopener">World Bank&amp;rsquo;s Open Data catalog&lt;/a>.
You can find the same &lt;a href="https://www.rforecology.com/data/pct_forest_world.csv/">data set here to follow along&lt;/a>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Load data&lt;/span>
forest_dat &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">read.csv&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;pct_forest_world.csv&amp;#34;&lt;/span>)
&lt;span style="color:#586e75"># Subset data and rename columns for easier visualization for this blog post&lt;/span>
forest_dat &lt;span style="color:#719e07">&amp;lt;-&lt;/span> forest_dat[114&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">122&lt;/span>, &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>, &lt;span style="color:#2aa198">33&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">40&lt;/span>)] &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">rename&lt;/span>(name &lt;span style="color:#719e07">=&lt;/span> Country.Name, code &lt;span style="color:#719e07">=&lt;/span> Country.Code)
&lt;span style="color:#586e75"># View data&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(forest_dat)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code class="language-{.numberLines" data-lang="{.numberLines">## name code X1988 X1989 X1990 X1991
## 114 Iraq IRQ NA NA 1.8382605 1.8414615
## 115 Iceland ISL NA NA 0.1702743 0.1830025
## 116 Israel ISR NA NA 6.0998152 6.1968577
## 117 Italy ITA NA NA 25.8058210 26.0708578
## 118 Jamaica JAM NA NA 48.1329640 48.1303786
## 119 Jordan JOR NA NA 1.1049411 1.1049411
## X1992 X1993 X1994 X1995
## 114 1.8446624 1.8478634 1.851064 1.8542653
## 115 0.1957307 0.2084589 0.221187 0.2339152
## 116 6.2939002 6.3909427 6.487985 6.5850277
## 117 26.3358947 26.6009316 26.865969 27.1310054
## 118 48.1277932 48.1252078 48.122622 48.1200369
## 119 1.1049411 1.1049411 1.104941 1.1049411
&lt;/code>&lt;/pre>&lt;p>If we check out the data, we can see that the first two columns describe country name and country code. After that, each column represents one year, ranging from 1988 to 1995. There are a bunch of &lt;code>NAs&lt;/code> in the data before 1990, which is likely when the data set starts.&lt;/p>
&lt;p>This data is currently in wide format. Each row represents one country, and the observations (% forest area each year) are spread out across a lot of columns. As time goes on, more columns will be added for each new year. This is very common for time-series data, where each column represents a new time point.&lt;/p>
&lt;p>Before we begin reshaping our data, let&amp;rsquo;s get rid of the &amp;ldquo;X&amp;rdquo; that&amp;rsquo;s in front of every single year. We can do this using the &lt;code>sub()&lt;/code> function. We asked R to substitute the &amp;ldquo;X&amp;rdquo; in front of all the &lt;code>forest_dat&lt;/code> column names with &amp;quot;&amp;quot; (nothing).&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Edit column names&lt;/span>
&lt;span style="color:#268bd2">colnames&lt;/span>(forest_dat) &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">sub&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;X&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>, &lt;span style="color:#268bd2">colnames&lt;/span>(forest_dat))
&lt;span style="color:#586e75"># View data&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(forest_dat)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code class="language-{style="max-height:" data-lang="{style="max-height:">## name code 1988 1989 1990 1991 1992
## 114 Iraq IRQ NA NA 1.8382605 1.8414615 1.8446624
## 115 Iceland ISL NA NA 0.1702743 0.1830025 0.1957307
## 116 Israel ISR NA NA 6.0998152 6.1968577 6.2939002
## 117 Italy ITA NA NA 25.8058210 26.0708578 26.3358947
## 118 Jamaica JAM NA NA 48.1329640 48.1303786 48.1277932
## 119 Jordan JOR NA NA 1.1049411 1.1049411 1.1049411
## 1993 1994 1995
## 114 1.8478634 1.851064 1.8542653
## 115 0.2084589 0.221187 0.2339152
## 116 6.3909427 6.487985 6.5850277
## 117 26.6009316 26.865969 27.1310054
## 118 48.1252078 48.122622 48.1200369
## 119 1.1049411 1.104941 1.1049411
&lt;/code>&lt;/pre>&lt;p>Great, now we&amp;rsquo;re ready to reshape our data.&lt;/p>
&lt;h3 id="how-to-use-pivot_longer">How to use &lt;code>pivot_longer()&lt;/code>&lt;/h3>
&lt;p>The &lt;code>tidyr&lt;/code> package provides us with some useful functions to help us reshape our data. One of these functions is &lt;code>pivot_longer()&lt;/code>, which — you guessed it — changes your data from wide to long format.&lt;/p>
&lt;p>The important arguments to know in this function are as follows:
&lt;code>pivot_longer(data = data.frame, cols = columns.to.pivot, names_to = &amp;quot;New Column Name&amp;quot;, values_to = &amp;quot;New Column Name&amp;quot;)&lt;/code>&lt;/p>
&lt;p>&lt;code>data&lt;/code> is just the data frame you want to reshape. &lt;code>cols&lt;/code> lists the columns that you want to pivot. &lt;code>names_to&lt;/code> is the name of the column that will be created from the variables that are in the column names. &lt;code>values_to&lt;/code> is the name of the column that will be created from the values that are in the cells of the table.&lt;/p>
&lt;p>This image shows how the function would pivot a simplified version of our data. You can see that each cell in the wide table on the right becomes its own row in the long table on the left:&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/reshape_image8.png" alt="Image showing data table going from wide format to long format. It shows arrows going from cells in the wide format table to rows in the long format table">&lt;/p>
&lt;p>Let&amp;rsquo;s look at a concrete example to see how the function works. In the code below, I asked the function to pivot our &lt;code>forest_dat&lt;/code> data set, focusing on columns labeled &amp;ldquo;1988&amp;rdquo; through columns labeled &amp;ldquo;1995&amp;rdquo;. The function will create a new column called &amp;ldquo;year&amp;rdquo; to store all of the years that currently act as column names. The function will also create a new column to store all the % forest area values. I also added an argument, &lt;code>values_drop_na&lt;/code> and set it to &lt;code>TRUE&lt;/code>, which asks the function to drop rows where all values are missing (&lt;code>NAs&lt;/code>).&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a long format table&lt;/span>
forest_dat_long &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">pivot_longer&lt;/span>(forest_dat, cols &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;1988&amp;#34;&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">&amp;#34;1995&amp;#34;&lt;/span>, names_to &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;year&amp;#34;&lt;/span>,
values_to &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;pct_forest_area&amp;#34;&lt;/span>, values_drop_na &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#cb4b16">TRUE&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>And now if you look at the data, you can see that our table is much longer than it was before (going from 9 to 54 rows). The country names are repeated several times in the first column, and now we have a column that contains the year and another column that contains the observation (% forest area). Now, each row of the data table describes one year&amp;rsquo;s measurement of % forest area in a certain country. You&amp;rsquo;ll also notice that the 1988 and 1989 columns were dropped because they contained missing values (&lt;code>NAs&lt;/code>) for all countries. If we hadn&amp;rsquo;t added the &lt;code>values_drop_na&lt;/code> argument, then we would still have values for 1988 and 1989 in our table, and it would just say &lt;code>NA&lt;/code> for those rows.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View data&lt;/span>
&lt;span style="color:#268bd2">print&lt;/span>(forest_dat_long, n &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">54&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code class="language-{style="max-height:" data-lang="{style="max-height:">## # A tibble: 54 × 4
## name code year pct_forest_area
## &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt;
## 1 Iraq IRQ 1990 1.84
## 2 Iraq IRQ 1991 1.84
## 3 Iraq IRQ 1992 1.84
## 4 Iraq IRQ 1993 1.85
## 5 Iraq IRQ 1994 1.85
## 6 Iraq IRQ 1995 1.85
## 7 Iceland ISL 1990 0.170
## 8 Iceland ISL 1991 0.183
## 9 Iceland ISL 1992 0.196
## 10 Iceland ISL 1993 0.208
## 11 Iceland ISL 1994 0.221
## 12 Iceland ISL 1995 0.234
## 13 Israel ISR 1990 6.10
## 14 Israel ISR 1991 6.20
## 15 Israel ISR 1992 6.29
## 16 Israel ISR 1993 6.39
## 17 Israel ISR 1994 6.49
## 18 Israel ISR 1995 6.59
## 19 Italy ITA 1990 25.8
## 20 Italy ITA 1991 26.1
## 21 Italy ITA 1992 26.3
## 22 Italy ITA 1993 26.6
## 23 Italy ITA 1994 26.9
## 24 Italy ITA 1995 27.1
## 25 Jamaica JAM 1990 48.1
## 26 Jamaica JAM 1991 48.1
## 27 Jamaica JAM 1992 48.1
## 28 Jamaica JAM 1993 48.1
## 29 Jamaica JAM 1994 48.1
## 30 Jamaica JAM 1995 48.1
## 31 Jordan JOR 1990 1.10
## 32 Jordan JOR 1991 1.10
## 33 Jordan JOR 1992 1.10
## 34 Jordan JOR 1993 1.10
## 35 Jordan JOR 1994 1.10
## 36 Jordan JOR 1995 1.10
## 37 Japan JPN 1990 68.4
## 38 Japan JPN 1991 68.4
## 39 Japan JPN 1992 68.4
## 40 Japan JPN 1993 68.4
## 41 Japan JPN 1994 68.3
## 42 Japan JPN 1995 68.3
## 43 Kazakhstan KAZ 1990 1.27
## 44 Kazakhstan KAZ 1991 1.27
## 45 Kazakhstan KAZ 1992 1.17
## 46 Kazakhstan KAZ 1993 1.17
## 47 Kazakhstan KAZ 1994 1.17
## 48 Kazakhstan KAZ 1995 1.17
## 49 Kenya KEN 1990 6.78
## 50 Kenya KEN 1991 6.80
## 51 Kenya KEN 1992 6.82
## 52 Kenya KEN 1993 6.83
## 53 Kenya KEN 1994 6.85
## 54 Kenya KEN 1995 6.87
&lt;/code>&lt;/pre>&lt;p>This format also makes it easy to plot our data. For example, let&amp;rsquo;s look at % forest cover over the years for Japan.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Filter out all rows for Japan&lt;/span>
japan &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">filter&lt;/span>(forest_dat_long, name &lt;span style="color:#719e07">==&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Japan&amp;#34;&lt;/span>)
&lt;span style="color:#586e75"># Plot % forest cover over time in Japan&lt;/span>
&lt;span style="color:#268bd2">plot&lt;/span>(data &lt;span style="color:#719e07">=&lt;/span> japan, pct_forest_area &lt;span style="color:#719e07">~&lt;/span> year)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/reshaping-data-in-r/index_files/figure-html/unnamed-chunk-7-1.png" width="80%" style="display: block; margin: auto;" />&lt;/p>
&lt;p>With our data in long format, we can easily see how we might want to group our data (maybe by country or by year) and then analyze it. Now let&amp;rsquo;s see how to turn our data back into a wide format table.&lt;/p>
&lt;h3 id="how-to-use-pivot_wider">How to use &lt;code>pivot_wider()&lt;/code>&lt;/h3>
&lt;p>The &lt;code>pivot_wider()&lt;/code> function works similarly to the &lt;code>pivot_longer&lt;/code> function, but the opposite.&lt;/p>
&lt;p>Now, we want to widen our data and spread it out instead of gathering it into a longer form. The function works like this:&lt;/p>
&lt;p>&lt;code>pivot_wider(data = data.frame, id_cols = identifying_columns, names_from = &amp;quot;Col with Names&amp;quot;, values_from = &amp;quot;Col with Values&amp;quot;)&lt;/code>&lt;/p>
&lt;p>&lt;code>data&lt;/code> is just the data frame you want to reshape. &lt;code>id_cols&lt;/code> lists the columns that contain essential identifying information for each observation. &lt;code>names_from&lt;/code> is the name of the column that will be spread out to become more column names. &lt;code>values_from&lt;/code> is the name of the column that the cell values will come from.&lt;/p>
&lt;p>In the code below, I asked &lt;code>pivot_wider()&lt;/code> to keep the columns &amp;ldquo;name&amp;rdquo; and &amp;ldquo;code&amp;rdquo; as identifying columns. I told the function to take all the new column names from the &amp;ldquo;year&amp;rdquo; column, and to take all the new values to fill in the table from the &amp;ldquo;pct_forest_area&amp;rdquo; column.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a wide format table&lt;/span>
forest_dat_wide &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">pivot_wider&lt;/span>(data &lt;span style="color:#719e07">=&lt;/span> forest_dat_long, id_cols &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;name&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;code&amp;#34;&lt;/span>),
names_from &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;year&amp;#34;&lt;/span>, values_from &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;pct_forest_area&amp;#34;&lt;/span>)
&lt;span style="color:#586e75"># View table&lt;/span>
&lt;span style="color:#268bd2">print&lt;/span>(forest_dat_wide, n &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">9&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code class="language-{style="max-height:" data-lang="{style="max-height:">## # A tibble: 9 × 8
## name code `1990` `1991` `1992` `1993` `1994` `1995`
## &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;
## 1 Iraq IRQ 1.84 1.84 1.84 1.85 1.85 1.85
## 2 Iceland ISL 0.170 0.183 0.196 0.208 0.221 0.234
## 3 Israel ISR 6.10 6.20 6.29 6.39 6.49 6.59
## 4 Italy ITA 25.8 26.1 26.3 26.6 26.9 27.1
## 5 Jamaica JAM 48.1 48.1 48.1 48.1 48.1 48.1
## 6 Jordan JOR 1.10 1.10 1.10 1.10 1.10 1.10
## 7 Japan JPN 68.4 68.4 68.4 68.4 68.3 68.3
## 8 Kazakhstan KAZ 1.27 1.27 1.17 1.17 1.17 1.17
## 9 Kenya KEN 6.78 6.80 6.82 6.83 6.85 6.87
&lt;/code>&lt;/pre>&lt;p>You can see that the table looks much as it did when we first downloaded it. We have our main identifying columns for country name and code, and then we have several columns after that, each representing one year. The 1988 and 1989 columns didn&amp;rsquo;t get added back in because they only had &lt;code>NA&lt;/code> values.&lt;/p>
&lt;p>And now you know how to reshape your data from wide to long format and then back again.
I hope this tutorial was helpful! Happy coding!&lt;/p>
&lt;p>If you liked this and want learn more, you can check out the full course on the complete basics of R for ecology &lt;a href="https://www.rforecology.com" target="_blank" rel="noopener">right here&lt;/a> or by clicking the link below.&lt;/p>
&lt;br>
&lt;hr>
&lt;center>Check out Luka's full course the Basics of R (for ecologists) here:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://www.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning R now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>How to create your own functions in R</title><link>https://www.rforecology.com/post/how-to-create-your-own-function-in-r/</link><pubDate>Thu, 21 Apr 2022 11:46:53 -0400</pubDate><guid>https://www.rforecology.com/post/how-to-create-your-own-function-in-r/</guid><description>&lt;p>We&amp;rsquo;ve talked a lot about how to use different pre-made functions in R, but sometimes you just need to make your own function to tackle your data. In this blog post, I&amp;rsquo;m going to talk about how to create your own function and give a few examples.&lt;/p>
&lt;img src="https://www.rforecology.com/functions_image0_new.png" alt="Image saying 'how to put together custom functions', with emphasis on 'FUN'. Also shows three children building a structure with blocks." style="width:400px;"/>
&lt;h2 id="components-of-a-function">Components of a function&lt;/h2>
&lt;p>Remember that a function is essentially a &amp;ldquo;black box&amp;rdquo; into which you add some inputs and then receive some outputs. Building a function is about building that &amp;ldquo;black box&amp;rdquo;, and there are several components that go into it.&lt;/p>
&lt;p>Let&amp;rsquo;s first discuss those components. I&amp;rsquo;ve created an example function below, called &amp;ldquo;add_three&amp;rdquo;. It adds three to the value that is passed by the user.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">add_three &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">function&lt;/span>(x){
y &lt;span style="color:#719e07">&amp;lt;-&lt;/span> x &lt;span style="color:#719e07">+&lt;/span> &lt;span style="color:#2aa198">3&lt;/span>
&lt;span style="color:#268bd2">return&lt;/span>(y)
}
&lt;span style="color:#268bd2">add_three&lt;/span>(&lt;span style="color:#2aa198">5&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 8
&lt;/code>&lt;/pre>&lt;p>Take note of a few important elements:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Function name&lt;/strong> (&lt;code>add_three&lt;/code>): this is just the name that you want to call your function. It should be something pretty short and easy to remember, like so many of the common functions we use (e.g., &lt;code>mean&lt;/code>, &lt;code>plot&lt;/code>, &lt;code>select&lt;/code>). I chose the name &amp;ldquo;add_three&amp;rdquo;. As when we create any variables or objects in R, we use the arrow &lt;code>&amp;lt;-&lt;/code> to assign this name to our function.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>&amp;ldquo;function&amp;rdquo; and arguments&lt;/strong> (&lt;code>function(x)&lt;/code>): we tell R that we want to create a function using &lt;code>function()&lt;/code>. Within the parentheses, we can specify the number of arguments that we want our function to have. It doesn&amp;rsquo;t matter what we name our arguments within the parentheses (I named mine &lt;code>x&lt;/code>), as long as we use the same names in the body of the function. If you want to have multiple arguments, it would look something like this: &lt;code>function(arg1, arg2, arg3, ...)&lt;/code>. Later, when you put your function to use, you&amp;rsquo;ll have to specify values for the arguments, like I did with the &lt;code>5&lt;/code> in &lt;code>add_three(5)&lt;/code>.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Curly brackets&lt;/strong>: &lt;code>{&lt;/code> and &lt;code>}&lt;/code> come after &lt;code>function(argument)&lt;/code> and need to bracket the actual function code that you&amp;rsquo;re writing.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Body of the function&lt;/strong>: this is the code in the function between the curly brackets that executes the task that you want. Here, I&amp;rsquo;ve created a new variable, &lt;code>y&lt;/code>, to store the &lt;code>x + 3&lt;/code> value.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>The return value&lt;/strong> (&lt;code>return(y)&lt;/code>): Also inside the curly brackets, but usually at the end, this is the result that the function prints for you when it&amp;rsquo;s done running. I asked the function to &lt;em>return&lt;/em> the value of &lt;code>y&lt;/code> (aka, &lt;code>x + 3&lt;/code>) to me.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>And that&amp;rsquo;s all there is to creating your own function! Now I have a great function called &lt;code>add_three&lt;/code> that I can use over and over again. You&amp;rsquo;ll notice that when you create a function, R adds this function to your environment. Just like you have to load packages to use them in a script, you&amp;rsquo;ll have to run your function code to add it to your environment each time you use it in a new script.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/functions_image1.png" alt="Image of the add three function in the environment window">&lt;/p>
&lt;p>The example I just gave was very simple, but learning how to create your own function unlocks a whole new realm of coding that can be as simple or complex as you want.&lt;/p>
&lt;h2 id="a-few-examples">A few examples&lt;/h2>
&lt;h3 id="mathematical-formulas">Mathematical formulas&lt;/h3>
&lt;p>It&amp;rsquo;s not that hard to add three to a value. In fact, we probably didn&amp;rsquo;t need to create a function for that. But what if we want to create a function that performs something more complex, like solving quadratic equations? Let&amp;rsquo;s create a quadratic formula function.&lt;/p>
&lt;img src="https://www.rforecology.com/functions_image2.png" alt="Image of the quadratic formula." style="width:400px;"/>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">quadratic &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">function&lt;/span>(a, b, c){
root1 &lt;span style="color:#719e07">&amp;lt;-&lt;/span> (&lt;span style="color:#719e07">-&lt;/span>b &lt;span style="color:#719e07">+&lt;/span> &lt;span style="color:#268bd2">sqrt&lt;/span>(b^2 &lt;span style="color:#719e07">-&lt;/span> &lt;span style="color:#2aa198">4&lt;/span> &lt;span style="color:#719e07">*&lt;/span> a &lt;span style="color:#719e07">*&lt;/span> c)) &lt;span style="color:#719e07">/&lt;/span> (&lt;span style="color:#2aa198">2&lt;/span> &lt;span style="color:#719e07">*&lt;/span> a)
root2 &lt;span style="color:#719e07">&amp;lt;-&lt;/span> (&lt;span style="color:#719e07">-&lt;/span>b &lt;span style="color:#719e07">-&lt;/span> &lt;span style="color:#268bd2">sqrt&lt;/span>(b^2 &lt;span style="color:#719e07">-&lt;/span> &lt;span style="color:#2aa198">4&lt;/span> &lt;span style="color:#719e07">*&lt;/span> a &lt;span style="color:#719e07">*&lt;/span> c)) &lt;span style="color:#719e07">/&lt;/span> (&lt;span style="color:#2aa198">2&lt;/span> &lt;span style="color:#719e07">*&lt;/span> a)
root1 &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">paste&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;x =&amp;#34;&lt;/span>, root1)
root2 &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">paste&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;x =&amp;#34;&lt;/span>, root2)
&lt;span style="color:#268bd2">ifelse&lt;/span>(root1 &lt;span style="color:#719e07">==&lt;/span> root2, &lt;span style="color:#268bd2">return&lt;/span>(root1), &lt;span style="color:#268bd2">return&lt;/span>(&lt;span style="color:#268bd2">c&lt;/span>(root1, root2)))
}
&lt;/code>&lt;/pre>&lt;/div>&lt;p>This function accepts the coefficient &lt;code>\(a\)&lt;/code> of the quadratic term, the coefficient &lt;code>\(b\)&lt;/code> of the linear term, and the constant &lt;code>\(c\)&lt;/code> as arguments. I created two values to hold the two possible roots of the equation. I also wanted the function to print &amp;ldquo;x = answer&amp;rdquo;, so I created values that pasted the &amp;ldquo;x =&amp;rdquo; string onto the answer. The &lt;code>ifelse&lt;/code> statement at the end just says that if the two roots are equivalent, print only one of them. Otherwise, print both roots.&lt;/p>
&lt;p>Now let&amp;rsquo;s see if the function works. Let&amp;rsquo;s test an equation with only one root, &lt;code>\(x^2 + 6x + 9 = 0\)&lt;/code>, and an equation with two roots: &lt;code>\(x^2 - 8x + 15 = 0\)&lt;/code>&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">quadratic&lt;/span>(&lt;span style="color:#2aa198">1&lt;/span>, &lt;span style="color:#2aa198">6&lt;/span>, &lt;span style="color:#2aa198">9&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;x = -3&amp;quot;
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">quadratic&lt;/span>(&lt;span style="color:#2aa198">1&lt;/span>, &lt;span style="color:#2aa198">-8&lt;/span>, &lt;span style="color:#2aa198">15&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;x = 5&amp;quot; &amp;quot;x = 3&amp;quot;
&lt;/code>&lt;/pre>&lt;p>It works! And now we have a function to help with our math homework :)&lt;/p>
&lt;h3 id="manipulating-strings">Manipulating strings&lt;/h3>
&lt;p>Now that we&amp;rsquo;ve created a mathematical function, let&amp;rsquo;s try creating a function that manipulates strings. Let&amp;rsquo;s say we want a function that accepts a species name as an argument and returns an abbreviated version: the first letter of the genus + the rest of the species name. For example, the blue crab, &lt;em>Callinectes sapidus&lt;/em>, would be shortened to &lt;em>C. sapidus&lt;/em>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">shorten &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">function&lt;/span>(name){
name_split &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">strsplit&lt;/span>(name, split &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34; &amp;#34;&lt;/span>)
genus &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">substr&lt;/span>(name, &lt;span style="color:#2aa198">1&lt;/span>, &lt;span style="color:#2aa198">1&lt;/span>)
species &lt;span style="color:#719e07">&amp;lt;-&lt;/span> name_split[[1]][2]
new_name &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">paste&lt;/span>(genus, &lt;span style="color:#2aa198">&amp;#34;. &amp;#34;&lt;/span>, species, sep &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>)
&lt;span style="color:#268bd2">print&lt;/span>(new_name)
}
&lt;/code>&lt;/pre>&lt;/div>&lt;p>I first used &lt;code>strsplit()&lt;/code> to split up the full species name into genus and species, by specifying that I wanted the split to occur at the space between the words. The function &lt;code>substr()&lt;/code> allows you to pick out specific characters in a string. I asked &lt;code>substr()&lt;/code> to just take the first letter of the name. Then I created the new string by pasting together the first letter of the genus, a period and space, and the species.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">shorten&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;Homo sapiens&amp;#34;&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;H. sapiens&amp;quot;
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">shorten&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;Leiostomus xanthurus&amp;#34;&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;L. xanthurus&amp;quot;
&lt;/code>&lt;/pre>&lt;p>Neat! This could be really useful for shortening the names in a list of species — writing a custom function makes the process much easier.&lt;/p>
&lt;h3 id="functions-without-arguments">Functions without arguments&lt;/h3>
&lt;p>You can also create functions that don&amp;rsquo;t require arguments at all. For example, I could create a function that generates random coordinates for me, prints them, and plots them on a world map.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Load necessary packages&lt;/span>
&lt;span style="color:#268bd2">library&lt;/span>(tidyverse)
&lt;span style="color:#268bd2">library&lt;/span>(maps)
&lt;span style="color:#586e75"># Create the function&lt;/span>
coords &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">function&lt;/span>(){
&lt;span style="color:#586e75"># Randomly sample to get a random lat and long&lt;/span>
latitude &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">runif&lt;/span>(n &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">1&lt;/span>, min &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">-90&lt;/span>, max &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">90&lt;/span>)
longitude &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">runif&lt;/span>(n &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">1&lt;/span>, min &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">-180&lt;/span>, max &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">180&lt;/span>)
&lt;span style="color:#268bd2">print&lt;/span>(&lt;span style="color:#268bd2">paste&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;Latitude: &amp;#34;&lt;/span>, latitude, &lt;span style="color:#2aa198">&amp;#34; Longitude: &amp;#34;&lt;/span>, longitude, sep &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>))
&lt;span style="color:#586e75"># get data to plot a world map &lt;/span>
world &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">map_data&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;world&amp;#34;&lt;/span>)
&lt;span style="color:#586e75"># Plot the world map&lt;/span>
&lt;span style="color:#268bd2">ggplot&lt;/span>() &lt;span style="color:#719e07">+&lt;/span>
&lt;span style="color:#268bd2">geom_map&lt;/span>(data &lt;span style="color:#719e07">=&lt;/span> world, map &lt;span style="color:#719e07">=&lt;/span> world,
&lt;span style="color:#268bd2">aes&lt;/span>(long, lat, map_id &lt;span style="color:#719e07">=&lt;/span> region),
color &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;black&amp;#34;&lt;/span>, fill &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;lightgray&amp;#34;&lt;/span>, size &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">0.1&lt;/span>) &lt;span style="color:#719e07">+&lt;/span>
&lt;span style="color:#586e75"># Plot our random point on top of the world map&lt;/span>
&lt;span style="color:#268bd2">geom_point&lt;/span>(&lt;span style="color:#268bd2">aes&lt;/span>(longitude, latitude), color &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;red&amp;#34;&lt;/span>)
}
&lt;/code>&lt;/pre>&lt;/div>&lt;p>I used the &lt;code>runif()&lt;/code> function to randomly sample one value each from the range of viable latitudes and longitudes. I used &lt;code>print()&lt;/code> and &lt;code>paste()&lt;/code> to display a message telling you the latitude and longitude values. Then I plotted the world map and our random point on top.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># What coordinates will I get this time?&lt;/span>
&lt;span style="color:#268bd2">coords&lt;/span>()
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;Latitude: 61.1949375551194 Longitude: 90.1810506172478&amp;quot;
&lt;/code>&lt;/pre>&lt;p>&lt;img src="https://www.rforecology.com/post/how-to-create-your-own-function-in-r/index_files/figure-html/unnamed-chunk-7-1.png" width="672" />&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># What about this time?&lt;/span>
&lt;span style="color:#268bd2">coords&lt;/span>()
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;Latitude: -62.7056198241189 Longitude: 24.0213383920491&amp;quot;
&lt;/code>&lt;/pre>&lt;p>&lt;img src="https://www.rforecology.com/post/how-to-create-your-own-function-in-r/index_files/figure-html/unnamed-chunk-7-2.png" width="672" />&lt;/p>
&lt;p>These examples were just three of many, many possibilities. Whatever task or operation you can think of in R, you can code it in a function. Get creative and have fun! Happy coding!&lt;/p>
&lt;p>If you liked this and want learn more, you can check out the full course on the complete basics of R for ecology &lt;a href="https://www.rforecology.com" target="_blank" rel="noopener">right here&lt;/a> or by clicking the link below.&lt;/p>
&lt;br>
&lt;hr>
&lt;center>Check out Luka's full course the Basics of R (for ecologists) here:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://coaching.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning R now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>A *simple* introduction to ggplot2 (for plotting your data!)</title><link>https://www.rforecology.com/post/a-simple-introduction-to-ggplot2/</link><pubDate>Thu, 14 Apr 2022 09:20:07 -0400</pubDate><guid>https://www.rforecology.com/post/a-simple-introduction-to-ggplot2/</guid><description>&lt;p>If you&amp;rsquo;ve ever been totally confused by ggplot2 and what it is or how it works, my intention is that this short tutorial simplifies it down to a conceptual level from which you can build up later. Hope you enjoy!&lt;/p>
&lt;img src="https://www.rforecology.com/ggplot2_simplified.png" alt="Image showing the ggplot function and the graphical elements that go with it, and arrows indicating how they are connected to a dataset." style="width:600px;"/>
&lt;p>Data visualization is a powerful tool for scientists and their audiences to easily grasp relationships and trends in data. Some of you may already know how to generate plots using base R. In this blog post, we&amp;rsquo;re going to introduce a package called &amp;ldquo;ggplot2&amp;rdquo; that makes it more intuitive to create consistently nice-looking figures in R.&lt;/p>
&lt;p>You can also watch this blog post as a video if you want to follow along while reading:&lt;/p>
&lt;a href="https://www.youtube.com/watch?v=FdVy57oGJuc" target="_blank">
&lt;img src="https://www.rforecology.com/ggplot2_image1.png" alt="Video thumbnail introducing the package ggplot2" style="width:500px;"/>
&lt;/a>
&lt;p>The &amp;ldquo;gg&amp;rdquo; part of &amp;ldquo;ggplot2&amp;rdquo; stands for the &lt;em>&lt;strong>g&lt;/strong>&lt;/em>rammar of &lt;em>&lt;strong>g&lt;/strong>&lt;/em>raphics. Just like sentences are composed of various parts of speech (e.g., nouns, verbs, adjectives) that are arranged using a grammatical structure, ggplot2 allows us to create figures using a standardized syntax.&lt;/p>
&lt;p>The first element in data visualization is your &lt;span style = "color: #bd8802;">&lt;strong>data&lt;/strong>&lt;/span>, of course! Let&amp;rsquo;s load up a data set that comes built into R, called ChickWeight, and take a quick look at it. The data describes the weights and ages of chicks that are fed different diets.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Load data&lt;/span>
&lt;span style="color:#268bd2">data&lt;/span>(ChickWeight)
&lt;span style="color:#268bd2">head&lt;/span>(ChickWeight)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## weight Time Chick Diet
## 1 42 0 1 1
## 2 51 2 1 1
## 3 59 4 1 1
## 4 64 6 1 1
## 5 76 8 1 1
## 6 93 10 1 1
&lt;/code>&lt;/pre>&lt;p>The next element is the &lt;span style = "color: #34A39A;">&lt;strong>aesthetics&lt;/strong>&lt;/span>. This includes things like which variable goes on the X axis, which variable goes on the Y axis, and what size, shape, or color you want your points/lines/bars/etc. to be. You might have already noticed, but in this blog post, I&amp;rsquo;m going to assign different colors to the different graphical elements so that you can quickly pick them out in the syntax.&lt;/p>
&lt;p>Let&amp;rsquo;s say we want to create a scatterplot showing &lt;span style = "color: #bd8802;">&lt;strong>weight&lt;/strong>&lt;/span> versus &lt;span style = "color: #bd8802;">&lt;strong>time&lt;/strong>&lt;/span> for these chicks. We&amp;rsquo;re going to assign &lt;span style = "color: #bd8802;">&lt;strong>time&lt;/strong>&lt;/span> to the &lt;span style = "color: #34A39A;">&lt;strong>X axis&lt;/strong>&lt;/span>, &lt;span style = "color: #bd8802;">&lt;strong>weight&lt;/strong>&lt;/span> to the &lt;span style = "color: #34A39A;">&lt;strong>Y axis&lt;/strong>&lt;/span>, and we want the &lt;span style = "color: #bd8802;">&lt;strong>different diets&lt;/strong>&lt;/span> to show up as different &lt;span style = "color: #34A39A;">&lt;strong>point colors&lt;/strong>&lt;/span>. When we assign variables to the different aesthetic elements, this is called &amp;ldquo;mapping&amp;rdquo; the variables to the elements.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ggplot2_image2.png" alt="Image showing that we want to map time to the X axis, weight to the Y axis, and diet to point color.">&lt;/p>
&lt;p>Once you figure out how you want to map your data to aesthetic elements, then you present your data using a &lt;span style = "color: #86c440;">&lt;strong>geometric object&lt;/strong>&lt;/span>, like a scatterplot, boxplot, lineplot, etc.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/ggplot2_image3.png" alt="Image showing the data, how we mapped the data to the aesthetics, and how we moved on to the geometry of the plot.">&lt;/p>
&lt;p>So now we&amp;rsquo;ve talked about the essential graphical elements: &lt;span style = "color: #bd8802;">&lt;strong>data&lt;/strong>&lt;/span>, &lt;span style = "color: #34A39A;">&lt;strong>aesthetics&lt;/strong>&lt;/span>, and &lt;span style = "color: #86c440;">&lt;strong>geometry&lt;/strong>&lt;/span>.&lt;/p>
&lt;p>There are a couple more elements in ggplot such as the &lt;span style = "color: #9543a8;">&lt;strong>coordinates&lt;/strong>&lt;/span>, which allow you to choose what part of the plot you&amp;rsquo;re showing, and the &lt;span style = "color: #9543a8;">&lt;strong>theme&lt;/strong>&lt;/span>, which allows you to decide how the graph looks in terms of things like font color, font family, and font size. If you don&amp;rsquo;t specify them, ggplot will just use the default settings for your plot.&lt;/p>
&lt;p>Now let&amp;rsquo;s see how we actually code this in R! The basic method of constructing a figure in ggplot begins with the function:&lt;/p>
&lt;h4 id="ggplot">&lt;strong>ggplot()&lt;/strong>&lt;/h4>
&lt;p>Notice that this doesn&amp;rsquo;t say ggplot&lt;strong>2&lt;/strong>(), though that&amp;rsquo;s the name of the package.&lt;/p>
&lt;p>The first argument in the function are the data:&lt;/p>
&lt;h4 id="ggplotspan-style--color-bd8802dataspan">&lt;strong>ggplot(&lt;span style = "color: #bd8802;">data&lt;/span>)&lt;/strong>&lt;/h4>
&lt;p>Then, we add the aesthetics:&lt;/p>
&lt;h4 id="ggplotspan-style--color-bd8802dataspan-span-style--color-34a39aaesx-yspan">&lt;strong>ggplot(&lt;span style = "color: #bd8802;">data&lt;/span>, &lt;span style = "color: #34A39A;">aes(x, y)&lt;/span>)&lt;/strong>&lt;/h4>
&lt;p>What would happen if we tried to plot this right now using the data? Remember when we loaded up ChickWeight way back at the start of this blog post?&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">ggplot&lt;/span>(ChickWeight, &lt;span style="color:#268bd2">aes&lt;/span>(x &lt;span style="color:#719e07">=&lt;/span> Time, y &lt;span style="color:#719e07">=&lt;/span> weight))
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/a-simple-introduction-to-ggplot2/index_files/figure-html/unnamed-chunk-2-1.png" width="80%" />&lt;/p>
&lt;p>We do see &lt;span style = "color: #bd8802;">&lt;strong>time&lt;/strong>&lt;/span> on the &lt;span style = "color: #34A39A;">&lt;strong>X axis&lt;/strong>&lt;/span> and &lt;span style = "color: #bd8802;">&lt;strong>weight&lt;/strong>&lt;/span> on the &lt;span style = "color: #34A39A;">&lt;strong>Y axis&lt;/strong>&lt;/span>, but nothing has shown up in the actual bounds of our plot because we&amp;rsquo;re missing our &lt;span style = "color: #86c440;">&lt;strong>geometry&lt;/strong>&lt;/span>.&lt;/p>
&lt;p>To add a &lt;span style = "color: #86c440;">&lt;strong>geometry&lt;/strong>&lt;/span> object to the ggplot() function, we just have to add a &lt;strong>&amp;quot;+&amp;quot;&lt;/strong> sign, add a new row, and add the geometry.&lt;/p>
&lt;p>The function for a scatterplot is &lt;span style = "color: #86c440;">&lt;strong>geom_point()&lt;/strong>&lt;/span>. This specific function changes depending on what kind of plot you want, but the functions all begin with &lt;span style = "color: #86c440;">**geom_**&lt;/span>. Within geom_point(), we can also specify aesthetics such as color or fill of the points, or any other aesthetic property that might be connected to the data. So now we have:&lt;/p>
&lt;h4 id="ggplotspan-style--color-bd8802dataspan-span-style--color-34a39aaesx-yspan-">&lt;strong>ggplot(&lt;span style = "color: #bd8802;">data&lt;/span>, &lt;span style = "color: #34A39A;">aes(x, y)&lt;/span>) +&lt;/strong>&lt;/h4>
&lt;h4 id="span-style--color-86c440geom_pointspanspan-style--color-34a39aaescolorspanspan-style--color-86c440span">&lt;span style = "color: #86c440;">&lt;strong>geom_point(&lt;/span>&lt;span style = "color: #34A39A;">aes(color)&lt;/span>&lt;span style = "color: #86c440;">)&lt;/span>&lt;/strong>&lt;/h4>
&lt;p>Now to actually put the data in! To map data to aesthetics, we just set the aesthetics equal to whatever the variable name is in our dataframe. Using the current data, the code should look like this:&lt;/p>
&lt;h4 id="ggplotspan-style--color-bd8802chickweightspan-span-style--color-34a39aaesx--spanspan-style--color-bd8802timespan-span-style--color-34a39ay-span-span-style--color-bd8802weightspanspan-style--color-34a39aspan-">&lt;strong>ggplot(&lt;span style = "color: #bd8802;">ChickWeight&lt;/span>, &lt;span style = "color: #34A39A;">aes(x = &lt;/span>&lt;span style = "color: #bd8802;">Time&lt;/span>, &lt;span style = "color: #34A39A;">y =&lt;/span> &lt;span style = "color: #bd8802;">weight&lt;/span>&lt;span style = "color: #34A39A;">)&lt;/span>) +&lt;/strong>&lt;/h4>
&lt;h4 id="span-style--color-86c440geom_pointspanspan-style--color-34a39aaescolor--span-style--color-bd8802dietspanspan-style--color-34a39aspanspan-style--color-86c440span">&lt;span style = "color: #86c440;">&lt;strong>geom_point(&lt;/span>&lt;span style = "color: #34A39A;">aes(color = &lt;span style = "color: #bd8802;">Diet&lt;/span>&lt;span style = "color: #34A39A;">)&lt;/span>&lt;span style = "color: #86c440;">)&lt;/span>&lt;/strong>&lt;/h4>
&lt;p>And now if we plot it&amp;hellip;&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">ggplot&lt;/span>(ChickWeight, &lt;span style="color:#268bd2">aes&lt;/span>(x &lt;span style="color:#719e07">=&lt;/span> Time, y &lt;span style="color:#719e07">=&lt;/span> weight)) &lt;span style="color:#719e07">+&lt;/span>
&lt;span style="color:#268bd2">geom_point&lt;/span>(&lt;span style="color:#268bd2">aes&lt;/span>(color &lt;span style="color:#719e07">=&lt;/span> Diet))
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/a-simple-introduction-to-ggplot2/index_files/figure-html/unnamed-chunk-3-1.png" width="80%" />&lt;/p>
&lt;p>Ta-da! We have a graph showing chick weight versus time, and we are able to represent different chick diets with different colors in the figure. Notice that ggplot automatically adds in a legend for you.&lt;/p>
&lt;p>If we really want, we can also add in other elements such as the &lt;span style = "color: #9543a8;">&lt;strong>coordinates&lt;/strong>&lt;/span> and &lt;span style = "color: #9543a8;">&lt;strong>theme&lt;/strong>&lt;/span> like so (the X&amp;rsquo;s are stand-ins for various functions that could fill in the space, such as &amp;ldquo;theme_classic()&amp;rdquo;, for example):&lt;/p>
&lt;h4 id="ggplotspan-style--color-bd8802chickweightspan-span-style--color-34a39aaesx-spanspan-style--color-bd8802timespan-span-style--color-34a39ay-span-span-style--color-bd8802weightspanspan-style--color-34a39aspan-">&lt;strong>ggplot(&lt;span style = "color: #bd8802;">ChickWeight&lt;/span>, &lt;span style = "color: #34A39A;">aes(x =&lt;/span>&lt;span style = "color: #bd8802;">Time&lt;/span>, &lt;span style = "color: #34A39A;">y =&lt;/span> &lt;span style = "color: #bd8802;">weight&lt;/span>&lt;span style = "color: #34A39A;">)&lt;/span>) +&lt;/strong>&lt;/h4>
&lt;h4 id="span-style--color-86c440geom_pointspanspan-style--color-34a39aaescolor--span-style--color-bd8802dietspanspan-style--color-34a39aspanspan-style--color-86c440span-">&lt;span style = "color: #86c440;">&lt;strong>geom_point(&lt;/span>&lt;span style = "color: #34A39A;">aes(color = &lt;span style = "color: #bd8802;">Diet&lt;/span>&lt;span style = "color: #34A39A;">)&lt;/span>&lt;span style = "color: #86c440;">)&lt;/span> +&lt;/strong>&lt;/h4>
&lt;h4 id="span-style--color-9543a8coord_xxx-span">&lt;span style = "color: #9543a8;">&lt;strong>coord_XXX() +&lt;/strong>&lt;/span>&lt;/h4>
&lt;h4 id="span-style--color-9543a8theme_xxxspan">&lt;span style = "color: #9543a8;">&lt;strong>theme_XXX()&lt;/strong>&lt;/span>&lt;/h4>
&lt;p>If we plot this out, it might look something like this:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">ggplot&lt;/span>(ChickWeight, &lt;span style="color:#268bd2">aes&lt;/span>(x &lt;span style="color:#719e07">=&lt;/span> Time, y &lt;span style="color:#719e07">=&lt;/span> weight)) &lt;span style="color:#719e07">+&lt;/span>
&lt;span style="color:#268bd2">geom_point&lt;/span>(&lt;span style="color:#268bd2">aes&lt;/span>(color &lt;span style="color:#719e07">=&lt;/span> Diet)) &lt;span style="color:#719e07">+&lt;/span>
&lt;span style="color:#268bd2">coord_cartesian&lt;/span>() &lt;span style="color:#719e07">+&lt;/span>
&lt;span style="color:#268bd2">theme_classic&lt;/span>()
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/a-simple-introduction-to-ggplot2/index_files/figure-html/unnamed-chunk-4-1.png" width="80%" />&lt;/p>
&lt;p>And don&amp;rsquo;t worry if all this gets a bit confusing or hard to remember after the basic graphical elements. Luckily there&amp;rsquo;s a &lt;strong>&lt;a href = "https://www.rstudio.com/resources/cheatsheets/" target = "_blank">cheatsheet&lt;/a>&lt;/strong> online to help you remember everything that you can do with ggplot2.&lt;/p>
&lt;p>Hope you enjoyed this brief introduction to ggplot2! It took me a long time to come to terms with learning how to ggplot, but when I finally did, it really did change how I do data visualizations. If you want to learn even more about how to create different types of figures with ggplot2, check out my full online course in data visualization, titled “Introduction to data visualization in R (for ecologists)” &lt;a href="https://coaching.rforecology.com/intro-to-dataviz-in-r-for-ecologists-enroll" target="_blank" rel="noopener">here.&lt;/a> There I go over the five key types of plots in R for ecology and much more! Here&amp;rsquo;s a sample of what you&amp;rsquo;ll learn to create in that course:&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/intro_dataviz_examples.png" alt="Image showing the data, how we mapped the data to the aesthetics, and how we moved on to the geometry of the plot.">&lt;/p>
&lt;br>
&lt;hr>
&lt;center>If you liked this and want learn even more more, check out my full course data visualization with R (for ecologists) here:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://coaching.rforecology.com/intro-to-dataviz-in-r-for-ecologists-enroll?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start visualizing your data now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>How to make a boxplot in R</title><link>https://www.rforecology.com/post/2022-04-06-how-to-make-a-boxplot-in-r/</link><pubDate>Wed, 06 Apr 2022 11:46:53 -0400</pubDate><guid>https://www.rforecology.com/post/2022-04-06-how-to-make-a-boxplot-in-r/</guid><description>&lt;script src="https://rforecology.activehosted.com/f/embed.php?id=21" type="text/javascript" charset="utf-8">&lt;/script>
&lt;p>In this tutorial, I&amp;rsquo;m going to show you how to plot and customize boxplots (also known as box and whisker plots). Boxplots are a common type of graph that allow you to look at the relationships between a continuous variable and various categorical groups. They are super common in ecology because we often need to compare values between different categories.&lt;/p>
&lt;img src="https://www.rforecology.com/boxplots_image0.png/" alt="An example boxplot showing all the different elements including color, axis labels, line type and weight, and boxplot orientation." style="width:400px;"/>
&lt;p>BTW, you can also follow along with a video tutorial of this blog post if you click on the image below:&lt;/p>
&lt;a href="https://www.youtube.com/watch?v=q03cJVMNpsU/" target="_blank">
&lt;img src="https://www.rforecology.com/boxplots_image1.png" alt="Video thumbnail for how to make a boxplot" style="width:400px;">
&lt;/a>
&lt;p>For this tutorial, we&amp;rsquo;re going to use the built-in R dataset &lt;code>PlantGrowth&lt;/code>, which might seem familiar to you because we used it in a few other data visualization tutorials.&lt;/p>
&lt;p>To refresh your memory, &lt;code>PlantGrowth&lt;/code> has 30 rows and two columns. The &amp;ldquo;weight&amp;rdquo; column represents the dry biomass of each plant in grams, while the &amp;ldquo;group&amp;rdquo; column describes the experimental treatment that each plant was given.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Load the data&lt;/span>
&lt;span style="color:#268bd2">data&lt;/span>(PlantGrowth)
&lt;span style="color:#586e75"># View the data&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(PlantGrowth)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## weight group
## 1 4.17 ctrl
## 2 5.58 ctrl
## 3 5.18 ctrl
## 4 6.11 ctrl
## 5 4.50 ctrl
## 6 4.61 ctrl
&lt;/code>&lt;/pre>&lt;p>Let&amp;rsquo;s say we want to compare the weight of plants among the different treatments. A boxplot is perfect for this type of visualization.&lt;/p>
&lt;p>We&amp;rsquo;ve already learned about the &lt;code>plot()&lt;/code> function in our earlier scatterplot tutorial (see our previous &lt;a href="https://www.rforecology.com/post/making-your-first-plot-in-r/" target="_blank" rel="noopener">blog post&lt;/a>). Something neat about &lt;code>plot()&lt;/code> is that if the X axis is a categorical variable, the function will recognize that and will automatically graph a boxplot for you instead of a scatterplot.&lt;/p>
&lt;p>If we look at the levels in the &amp;ldquo;group&amp;rdquo; column, we can see that &amp;ldquo;group&amp;rdquo; is indeed a categorical variable, with three different levels:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Look at the levels of the &amp;#34;group&amp;#34; column&lt;/span>
&lt;span style="color:#268bd2">levels&lt;/span>(PlantGrowth&lt;span style="color:#719e07">$&lt;/span>group)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;ctrl&amp;quot; &amp;quot;trt1&amp;quot; &amp;quot;trt2&amp;quot;
&lt;/code>&lt;/pre>&lt;p>So if we plot weight as a function of group (y as a function of x), we should get a boxplot.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Make a boxplot of weight as a function of treatment group&lt;/span>
&lt;span style="color:#268bd2">plot&lt;/span>(weight &lt;span style="color:#719e07">~&lt;/span> group, data &lt;span style="color:#719e07">=&lt;/span> PlantGrowth)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/2022-04-06-how-to-make-a-boxplot-in-r/index_files/figure-html/unnamed-chunk-3-1.png" width="672" />&lt;/p>
&lt;p>Awesome! We can see plant weight across the three different treatment groups, allowing us to easily compare groups.&lt;/p>
&lt;h3 id="boxplot-components">Boxplot components&lt;/h3>
&lt;p>Now, let&amp;rsquo;s quickly go over the components of a box plot.&lt;/p>
&lt;ul>
&lt;li>The solid black line in the middle of each box represents the &lt;strong>median&lt;/strong> of the data.&lt;/li>
&lt;li>The grey box represents the &lt;strong>&amp;ldquo;interquartile range&amp;rdquo; (IQR)&lt;/strong> of your data, or the range between the 1st and 3rd quartiles. Values below the &lt;strong>1st quartile&lt;/strong> represent the lowest 25% of your data points, while values above the &lt;strong>3rd quartile&lt;/strong> represent the highest 25% of your data. The interquartile range contains the middle 50% of your data points.&lt;/li>
&lt;li>The &amp;ldquo;whiskers&amp;rdquo; of a box and whisker plot are the dotted lines outside of the grey box. These end at the &lt;strong>minimum&lt;/strong> and &lt;strong>maximum&lt;/strong> values of your data set, excluding outliers.&lt;/li>
&lt;li>Sometimes, you will have &lt;strong>outliers&lt;/strong> in your data that are shown as points in the plot. Outliers are points that are more than (1.5 * IQR) below the 1st quartile or above the 3rd quartile.&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="https://www.rforecology.com/boxplots_image2.png" alt="Image showing the different components of a boxplot.">&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
&lt;strong>Quick note about the Min and Max whiskers:&lt;/strong>
The maximum and minimum whisker markers (the staples or &amp;ldquo;T&amp;quot;s) only indicate the maximum or minimum of the data if the 3rd quartile + 1.5 x IQR exceeds the maximum value or 1st quartile - 1.5 x IQR exceeds the minimum value, respectively. In other words, the whiskers &lt;strong>exclude outliers&lt;/strong>, which are all points greater than 1.5 x IQR above or below the 3rd or 1st quartiles.
&lt;/div>
&lt;/div>
&lt;h3 id="modifying-the-axes">Modifying the axes&lt;/h3>
&lt;p>Now that we understand all the parts of a boxplot, let&amp;rsquo;s play around with the different components of the plot, starting with the axes. Customizing the axes is the same as for scatterplots, where we&amp;rsquo;ll use the arguments &lt;code>xlab&lt;/code> and &lt;code>ylab&lt;/code> to change the axis labels.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Adding axis labels&lt;/span>
&lt;span style="color:#268bd2">plot&lt;/span>(weight &lt;span style="color:#719e07">~&lt;/span> group,
data &lt;span style="color:#719e07">=&lt;/span> PlantGrowth,
xlab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Treatment Group&amp;#34;&lt;/span>,
ylab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Dried Biomass Weight (g)&amp;#34;&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/2022-04-06-how-to-make-a-boxplot-in-r/index_files/figure-html/unnamed-chunk-4-1.png" width="672" />&lt;/p>
&lt;p>Great, now we have axis labels! But the individual treatment group labels on our X axis are still worded pretty vaguely. To change this, let&amp;rsquo;s actually go back to our data. Let&amp;rsquo;s change &amp;ldquo;ctrl&amp;rdquo; to &amp;ldquo;Control&amp;rdquo;, &amp;ldquo;trt1&amp;rdquo; to &amp;ldquo;High light&amp;rdquo;, and &amp;ldquo;trt2&amp;rdquo; to &amp;ldquo;Low light&amp;rdquo;.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Look at the levels of the group column&lt;/span>
&lt;span style="color:#268bd2">levels&lt;/span>(PlantGrowth&lt;span style="color:#719e07">$&lt;/span>group)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;ctrl&amp;quot; &amp;quot;trt1&amp;quot; &amp;quot;trt2&amp;quot;
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Change the names of the treatments in the data set itself&lt;/span>
&lt;span style="color:#268bd2">levels&lt;/span>(PlantGrowth&lt;span style="color:#719e07">$&lt;/span>group) &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;Control&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;High light&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Low light&amp;#34;&lt;/span>)
&lt;span style="color:#586e75"># View the group column again&lt;/span>
PlantGrowth&lt;span style="color:#719e07">$&lt;/span>group
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] Control Control Control Control Control Control
## [7] Control Control Control Control High light High light
## [13] High light High light High light High light High light High light
## [19] High light High light Low light Low light Low light Low light
## [25] Low light Low light Low light Low light Low light Low light
## Levels: Control High light Low light
&lt;/code>&lt;/pre>&lt;p>Now that we&amp;rsquo;ve changed the names of our treatments, let&amp;rsquo;s run the plot again.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">plot&lt;/span>(weight &lt;span style="color:#719e07">~&lt;/span> group,
data &lt;span style="color:#719e07">=&lt;/span> PlantGrowth,
xlab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Treatment Group&amp;#34;&lt;/span>,
ylab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Dried Biomass Weight (g)&amp;#34;&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/2022-04-06-how-to-make-a-boxplot-in-r/index_files/figure-html/unnamed-chunk-6-1.png" width="672" />&lt;/p>
&lt;h3 id="modifying-the-boxes-and-whiskers">Modifying the boxes and whiskers&lt;/h3>
&lt;p>Our plot is looking pretty good so far. Now let&amp;rsquo;s see how we can change the appearance of the boxes and whiskers. We can do this using the &lt;code>col&lt;/code> argument, which accepts any color name or hex code in quotes. You can also set &lt;code>col&lt;/code> to any number, which represents a predetermined color.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">plot&lt;/span>(weight &lt;span style="color:#719e07">~&lt;/span> group,
data &lt;span style="color:#719e07">=&lt;/span> PlantGrowth,
xlab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Treatment Group&amp;#34;&lt;/span>,
ylab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Dried Biomass Weight (g)&amp;#34;&lt;/span>,
col &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">4&lt;/span>) &lt;span style="color:#586e75"># or something like &amp;#34;blue&amp;#34; or a hex code like &amp;#34;#f234f9&amp;#34;&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/2022-04-06-how-to-make-a-boxplot-in-r/index_files/figure-html/unnamed-chunk-7-1.png" width="672" />&lt;/p>
&lt;p>It can be fun to use colors, but it&amp;rsquo;s data visualization best-practice to keep your figures black and white (or grey-scale) unless you need to use colors to signify something in particular. Note that in the case of our figure, there isn&amp;rsquo;t really a reason to change the color of the boxes except for the purposes of demonstration here.&lt;/p>
&lt;p>We can also change the appearance of the boxes' borders using &lt;code>boxlty&lt;/code>, which stands for &amp;ldquo;box line type&amp;rdquo;. This argument can accept integers, which represent different line types. 1 corresponds to a normal line, 2 corresponds to a dashed line, and 0 corresponds to no line. You can test out other numbers, too! For now, let&amp;rsquo;s get rid of the box borders.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">plot&lt;/span>(weight &lt;span style="color:#719e07">~&lt;/span> group,
data &lt;span style="color:#719e07">=&lt;/span> PlantGrowth,
xlab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Treatment Group&amp;#34;&lt;/span>,
ylab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Dried Biomass Weight (g)&amp;#34;&lt;/span>,
col &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">4&lt;/span>,
boxlty &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">0&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/2022-04-06-how-to-make-a-boxplot-in-r/index_files/figure-html/unnamed-chunk-8-1.png" width="672" />&lt;/p>
&lt;p>To change the whisker line type, you can use the argument &lt;code>whisklty&lt;/code>, which works the same way as &lt;code>boxlty&lt;/code>. You can also change whisker line thickness using &lt;code>whisklwd&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">plot&lt;/span>(weight &lt;span style="color:#719e07">~&lt;/span> group,
data &lt;span style="color:#719e07">=&lt;/span> PlantGrowth,
xlab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Treatment Group&amp;#34;&lt;/span>,
ylab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Dried Biomass Weight (g)&amp;#34;&lt;/span>,
col &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">4&lt;/span>,
boxlty &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">0&lt;/span>,
whisklty &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">3&lt;/span>,
whisklwd &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">1.5&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/2022-04-06-how-to-make-a-boxplot-in-r/index_files/figure-html/unnamed-chunk-9-1.png" width="672" />&lt;/p>
&lt;p>Lastly, you can change the line thickness of the ends of the whiskers (these are called staples) using the &lt;code>staplelwd&lt;/code> argument.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">plot&lt;/span>(weight &lt;span style="color:#719e07">~&lt;/span> group,
data &lt;span style="color:#719e07">=&lt;/span> PlantGrowth,
xlab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Treatment Group&amp;#34;&lt;/span>,
ylab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Dried Biomass Weight (g)&amp;#34;&lt;/span>,
col &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">4&lt;/span>,
boxlty &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">0&lt;/span>,
whisklty &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">3&lt;/span>,
whisklwd &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">1.5&lt;/span>,
staplelwd &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">1.5&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/2022-04-06-how-to-make-a-boxplot-in-r/index_files/figure-html/unnamed-chunk-10-1.png" width="672" />&lt;/p>
&lt;p>You&amp;rsquo;ll notice that the arguments &lt;code>boxlty&lt;/code> and &lt;code>whisklty&lt;/code> seem similar, and that &lt;code>whisklwd&lt;/code> and &lt;code>staplelwd&lt;/code> also seem similar. You might have already figured out that to change the different plot components and their attributes, you can just mix and match &lt;code>box&lt;/code>, &lt;code>whisk&lt;/code>, and &lt;code>staple&lt;/code> with &lt;code>lty&lt;/code>, &lt;code>lwd&lt;/code>, and &lt;code>col&lt;/code> (which changes the color).&lt;/p>
&lt;h3 id="changing-the-boxplot-orientation">Changing the boxplot orientation&lt;/h3>
&lt;p>The last thing you can modify is the orientation of the boxplot. Right now, the boxes and whiskers are oriented vertically. If you want them to be horizontal, you can just add the argument &lt;code>horizontal = TRUE&lt;/code>. This can be especially helpful if you have a lot of groups that you want to compare.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">plot&lt;/span>(weight &lt;span style="color:#719e07">~&lt;/span> group,
data &lt;span style="color:#719e07">=&lt;/span> PlantGrowth,
xlab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Treatment Group&amp;#34;&lt;/span>,
ylab &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Dried Biomass Weight (g)&amp;#34;&lt;/span>,
col &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">4&lt;/span>,
boxlty &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">0&lt;/span>,
whisklty &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">3&lt;/span>,
whisklwd &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">1.5&lt;/span>,
staplelwd &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">1.5&lt;/span>,
horizontal &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#cb4b16">TRUE&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/2022-04-06-how-to-make-a-boxplot-in-r/index_files/figure-html/unnamed-chunk-11-1.png" width="672" />&lt;/p>
&lt;p>And that&amp;rsquo;s it! Now we have a good-looking boxplot. In this tutorial I went over what the different parts of a boxplot mean, as well as how to modify the axes, the boxes and whiskers, and the orientation of the plot.&lt;/p>
&lt;p>I hope you enjoyed this post! If you liked this and want learn more, you can check out my full course on the complete basics of R for ecology &lt;a href="https://www.rforecology.com" target="_blank" rel="noopener">right here&lt;/a> or my course on data visualization with R (for ecologists) by clicking the link below.&lt;/p>
&lt;br>
&lt;hr>
&lt;center>Check out my full course on Data Visualization with R (for ecologists) here:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://coaching.rforecology.com/intro-to-dataviz-in-r-for-ecologists-enroll?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start Visualizing Data with R now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>Search through your ecological data with the 'grep()' function</title><link>https://www.rforecology.com/post/how-to-use-grepl/</link><pubDate>Tue, 29 Mar 2022 09:09:55 -0400</pubDate><guid>https://www.rforecology.com/post/how-to-use-grepl/</guid><description>&lt;p>We often want to search for a certain character pattern in our data. We do this all the time when we press &amp;ldquo;ctrl + F&amp;rdquo; (or &amp;ldquo;cmd + F&amp;rdquo; for a mac) on a webpage. For example, maybe you have a list of species names and want to find all of the individuals within a certain genus. Or maybe you have several columns of climate data and only want to select the ones related to precipitation.&lt;/p>
&lt;p>Here, I&amp;rsquo;m going to talk about the functions called &lt;code>grep()&lt;/code> and &lt;code>grepl()&lt;/code> that allow you to find strings in your data that match the pattern you&amp;rsquo;re looking for. I&amp;rsquo;m also going to discuss a function called &lt;code>sub()&lt;/code>, which allows you to find and replace strings.&lt;/p>
&lt;p>First, let&amp;rsquo;s load the &lt;code>dplyr&lt;/code> package, which I&amp;rsquo;ll be using once or twice during the tutorial to demonstrate common uses for &lt;code>grep()&lt;/code> and &lt;code>grepl()&lt;/code>. Note that &lt;code>grep()&lt;/code>, &lt;code>grepl()&lt;/code>, and &lt;code>sub()&lt;/code> come with base R, so there&amp;rsquo;s no need to load packages to use those functions.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">library&lt;/span>(dplyr)
&lt;/code>&lt;/pre>&lt;/div>&lt;img src="https://www.rforecology.com/grep_image1.png" alt="How to use grep, with example text showing the grep function searching for the word forest within a vector of habitat types" style="width:400px;"/>
&lt;h2 id="find-matches-using-grep-and-grepl">Find matches using &lt;code>grep()&lt;/code> and &lt;code>grepl()&lt;/code>&lt;/h2>
&lt;p>To demonstrate how to use these functions, I&amp;rsquo;ve downloaded a data set from the Environmental Data Initiative (EDI) &lt;a href="https://portal.edirepository.org/nis/" target="_blank" rel="noopener">data portal&lt;/a>. The EDI archives troves of environmental data that are publicly available and great for demonstration purposes or for supporting your own research. The data I downloaded describe the vegetation on barrier islands within the Virginia Coast Reserve Long-Term Ecological Research project. To follow along, &lt;a href="https://www.rforecology.com/data/VCR_data.csv">&lt;strong>you can download the data here.&lt;/strong>&lt;/a>&lt;/p>
&lt;p>Let&amp;rsquo;s import the data into R and subset it so that it&amp;rsquo;s easier to understand for this tutorial. I used the &lt;code>select()&lt;/code> function in &lt;code>dplyr&lt;/code>, where I first listed the data frame I want to analyze, and then the names of the columns I want to keep.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Upload data&lt;/span>
veg_dat &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">read.csv&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;VCR_data.csv&amp;#34;&lt;/span>)
&lt;span style="color:#586e75"># Select specific columns&lt;/span>
veg_dat &lt;span style="color:#719e07">&amp;lt;-&lt;/span> dplyr&lt;span style="color:#719e07">::&lt;/span>&lt;span style="color:#268bd2">select&lt;/span>(veg_dat, genus, species, island, habitat, relabund)
&lt;span style="color:#586e75"># View first few rows&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(veg_dat)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## genus species island habitat relabund
## 1 Acer rubrum Smith Pine-Hardwood_forest_stands 6
## 2 Acer rubrum Parramore Hardwood_forest_stands 6
## 3 Achillea millefolium Wreck Foredune_grassland 3
## 4 Achillea millefolium Smith Hardwood_forest_stands 4
## 5 Achillea millefolium Smith Dense_grasslands 4
## 6 Achillea millefolium Smith Foredune_grassland 4
&lt;/code>&lt;/pre>&lt;p>This data set lists observations of species presence on different islands and in different habitats on those islands. We have a column for genus, species, island, habitat type, and the relative abundance of the species.&lt;/p>
&lt;p>Let&amp;rsquo;s say that we&amp;rsquo;re interested in looking at all species that are found in forested habitats. How many habitat types do we have that are forested? We can use the &lt;code>unique()&lt;/code> function to view all the unique entries for the &lt;code>habitat&lt;/code> column.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View all unique values in the habitat column&lt;/span>
&lt;span style="color:#268bd2">unique&lt;/span>(veg_dat&lt;span style="color:#719e07">$&lt;/span>habitat)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;Pine-Hardwood_forest_stands&amp;quot;
## [2] &amp;quot;Hardwood_forest_stands&amp;quot;
## [3] &amp;quot;Foredune_grassland&amp;quot;
## [4] &amp;quot;Dense_grasslands&amp;quot;
## [5] &amp;quot;Low_thickets&amp;quot;
## [6] &amp;quot;Tall_thickets&amp;quot;
## [7] &amp;quot;Open_dunes-thicket_complex&amp;quot;
## [8] &amp;quot;Beachgrass_dunes--Dense_grassland_dunes&amp;quot;
## [9] &amp;quot;Foredune-sparse_grassland_complex&amp;quot;
## [10] &amp;quot;Salt_flat&amp;quot;
## [11] &amp;quot;Sparse_grassland&amp;quot;
## [12] &amp;quot;&amp;quot;
## [13] &amp;quot;Drift--Wrack&amp;quot;
## [14] &amp;quot;Beach&amp;quot;
## [15] &amp;quot;Pine_forest_stands&amp;quot;
## [16] &amp;quot;Fresh_marsh&amp;quot;
## [17] &amp;quot;Upper_salt_marsh&amp;quot;
## [18] &amp;quot;Overwash_flats&amp;quot;
## [19] &amp;quot;Mudflats&amp;quot;
## [20] &amp;quot;Brackish_marsh&amp;quot;
## [21] &amp;quot;Lower_salt_marsh&amp;quot;
## [22] &amp;quot;Pine-hardwood_forest_stands&amp;quot;
## [23] &amp;quot;Juniper_thickets&amp;quot;
## [24] &amp;quot;code_error&amp;quot;
## [25] &amp;quot;Open_water&amp;quot;
&lt;/code>&lt;/pre>&lt;p>It looks like we have several types of forest: &amp;ldquo;Pine-Hardwood_forest_stands&amp;rdquo;, &amp;ldquo;Hardwood_forest_stands&amp;rdquo;, and &amp;ldquo;Pine_forest_stands&amp;rdquo;. We also have &amp;ldquo;Pine-hardwood_forest_stands&amp;rdquo;, which is the same as the first one, but identified as a separate entry because &amp;ldquo;hardwood&amp;rdquo; is not capitalized. The easiest way to pick all of these habitats out of the data set would be if we could &amp;ldquo;ctrl-F&amp;rdquo; the word &amp;ldquo;forest&amp;rdquo; in the &lt;code>habitat&lt;/code> column. Luckily, we have the &lt;code>grep()&lt;/code> function to help us with that!&lt;/p>
&lt;p>You can use the function like so: &lt;code>grep(pattern_text, vector)&lt;/code>. You can also add an argument to &lt;code>grep()&lt;/code> where you set &lt;code>ignore.case = TRUE&lt;/code>. This tells the function that you don&amp;rsquo;t want your search to be case-sensitive (if you leave the &lt;code>ignore.case&lt;/code> argument out, the default is that the function &lt;em>is&lt;/em> case-sensitive). Let&amp;rsquo;s try it out.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Let&amp;#39;s see how grep works&lt;/span>
&lt;span style="color:#268bd2">grep&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;forest&amp;#34;&lt;/span>, veg_dat&lt;span style="color:#719e07">$&lt;/span>habitat, ignore.case &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#cb4b16">TRUE&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 1 2 4 7 34 35 54 91 96 97 106 112 113 114 146
## [16] 147 152 207 209 218 257 258 259 260 262 263 349 383 384 385
## [31] 388 389 390 397 398 414 424 442 443 444 474 484 485 488 545
## [46] 555 558 581 585 595 615 619 750 752 753 754 759 760 762 764
## [61] 768 771 812 823 828 834 835 837 855 911 915 916 932 933 934
## [76] 943 944 949 964 965 998 1015 1016 1028 1032 1033 1109 1122 1124 1125
## [91] 1138 1141 1142 1144 1145 1146 1173 1174 1175 1179 1180 1211 1215 1223 1224
## [106] 1227 1228 1229 1237 1238 1246 1247 1248 1250 1251 1252 1253 1256 1259 1264
## [121] 1265 1269 1270 1271 1272 1288 1289 1300 1408 1410 1458 1459 1462 1463 1464
## [136] 1538 1554 1555 1560 1561 1565 1567 1568 1569 1577 1578 1579 1585 1658 1659
## [151] 1745 1746 1877 1897 1899 1900 1903 1904 1908 1909 1910 1912 1915 1916 1917
## [166] 1918 1922 1923 1930 1931 1951 1952
&lt;/code>&lt;/pre>&lt;p>You can see that &lt;code>grep()&lt;/code> returns every row number in the data frame where a habitat type contains the string &amp;ldquo;forest&amp;rdquo;. If we add another argument to &lt;code>grep()&lt;/code> that says &lt;code>value = TRUE&lt;/code>, we can see the values where the function has found a match (in this case, the actual habitat types).&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Assign the list of values to a variable so we don&amp;#39;t see them all at once (it&amp;#39;s a long list)&lt;/span>
hab &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">grep&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;forest&amp;#34;&lt;/span>, veg_dat&lt;span style="color:#719e07">$&lt;/span>habitat, ignore.case &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#cb4b16">TRUE&lt;/span>, value &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#cb4b16">TRUE&lt;/span>)
&lt;span style="color:#586e75"># View some of the values&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(hab)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;Pine-Hardwood_forest_stands&amp;quot; &amp;quot;Hardwood_forest_stands&amp;quot;
## [3] &amp;quot;Hardwood_forest_stands&amp;quot; &amp;quot;Hardwood_forest_stands&amp;quot;
## [5] &amp;quot;Pine-Hardwood_forest_stands&amp;quot; &amp;quot;Hardwood_forest_stands&amp;quot;
&lt;/code>&lt;/pre>&lt;p>Great! &lt;code>grep()&lt;/code> seems pretty useful. So then how does &lt;code>grepl()&lt;/code> differ from &lt;code>grep()&lt;/code>? The input arguments are the same, but the function gives us a different output. Let&amp;rsquo;s see:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Assign the list of values to a variable so we don&amp;#39;t have to see all of them (it&amp;#39;s a long list)&lt;/span>
hab_log &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">grepl&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;forest&amp;#34;&lt;/span>, veg_dat&lt;span style="color:#719e07">$&lt;/span>habitat, ignore.case &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#cb4b16">TRUE&lt;/span>)
&lt;span style="color:#586e75"># View the grepl output&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(hab_log, &lt;span style="color:#2aa198">60&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] TRUE TRUE FALSE TRUE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
&lt;/code>&lt;/pre>&lt;p>The extra &amp;ldquo;l&amp;rdquo; in &lt;code>grepl()&lt;/code> stands for &amp;ldquo;logical&amp;rdquo;, which is the data type that it returns. &lt;code>grepl()&lt;/code> returns &lt;code>TRUE&lt;/code> when there is a match, and &lt;code>FALSE&lt;/code> when there isn&amp;rsquo;t one, all the way down the entire data frame.&lt;/p>
&lt;p>Now that we know what &lt;code>grep()&lt;/code> and &lt;code>grepl()&lt;/code> return, we can subset our data frame using their outputs.&lt;/p>
&lt;p>Here, I used &lt;code>grep()&lt;/code> to subset the data to return all rows where a species is found in forested habitat.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Use the grep output to subset the data frame&lt;/span>
forest_species &lt;span style="color:#719e07">&amp;lt;-&lt;/span> veg_dat&lt;span style="color:#268bd2">[grep&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;forest&amp;#34;&lt;/span>, veg_dat&lt;span style="color:#719e07">$&lt;/span>habitat, ignore.case &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#cb4b16">TRUE&lt;/span>), ]
&lt;span style="color:#586e75"># View the new data set&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(forest_species)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## genus species island habitat relabund
## 1 Acer rubrum Smith Pine-Hardwood_forest_stands 6
## 2 Acer rubrum Parramore Hardwood_forest_stands 6
## 4 Achillea millefolium Smith Hardwood_forest_stands 4
## 7 Achillea millefolium Parramore Hardwood_forest_stands 5
## 34 Amelanchier obovalis Smith Pine-Hardwood_forest_stands 5
## 35 Amelanchier obovalis Smith Hardwood_forest_stands 5
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># How many rows?&lt;/span>
&lt;span style="color:#268bd2">nrow&lt;/span>(forest_species)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 172
&lt;/code>&lt;/pre>&lt;p>Now I&amp;rsquo;ll demonstrate the same thing, but this time I&amp;rsquo;ll use &lt;code>grepl()&lt;/code> to subset the data, combined with the &lt;code>filter()&lt;/code> function in the &lt;code>dplyr&lt;/code> package. The &lt;code>filter()&lt;/code> function accepts the name of the data frame you want to analyze, then a logical test. The function will return the rows that are &lt;code>TRUE&lt;/code>.&lt;/p>
&lt;p>These two methods I demonstrated will return the same data frame, but some prefer to use the &lt;code>filter&lt;/code> method as it follows the &lt;code>dplyr&lt;/code> methodology for tidy scripts (since it can easily be combined with other functions such as &lt;code>select&lt;/code> and &lt;code>mutate&lt;/code>. &lt;a href="https://www.rforecology.com/post/how-to-use-pipes/" target="_blank" rel="noopener">See our post on the pipe operator to learn more&lt;/a>).&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Use the grepl output to subset the data frame&lt;/span>
forest_species2 &lt;span style="color:#719e07">&amp;lt;-&lt;/span> dplyr&lt;span style="color:#719e07">::&lt;/span>&lt;span style="color:#268bd2">filter&lt;/span>(veg_dat, &lt;span style="color:#268bd2">grepl&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;forest&amp;#34;&lt;/span>, habitat, ignore.case &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#cb4b16">TRUE&lt;/span>))
&lt;span style="color:#586e75"># View the new data set&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(forest_species2)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## genus species island habitat relabund
## 1 Acer rubrum Smith Pine-Hardwood_forest_stands 6
## 2 Acer rubrum Parramore Hardwood_forest_stands 6
## 3 Achillea millefolium Smith Hardwood_forest_stands 4
## 4 Achillea millefolium Parramore Hardwood_forest_stands 5
## 5 Amelanchier obovalis Smith Pine-Hardwood_forest_stands 5
## 6 Amelanchier obovalis Smith Hardwood_forest_stands 5
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># How many rows?&lt;/span>
&lt;span style="color:#268bd2">nrow&lt;/span>(forest_species2)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 172
&lt;/code>&lt;/pre>&lt;p>Now that I have a data frame containing only species found in forested habitat, I can do whatever type of data manipulation I want. For example, I could group by &lt;code>island&lt;/code> and &lt;code>habitat&lt;/code> and use the &lt;code>summarize()&lt;/code> function to see how many species are found within each habitat type on each island (to learn more about the &lt;code>group_by()&lt;/code> and &lt;code>summarize()&lt;/code> functions, check out our tutorial &lt;a href="https://www.rforecology.com/post/how-to-use-the-group-by-function/" target="_blank" rel="noopener">here&lt;/a>).&lt;/p>
&lt;p>I used the function &lt;code>n()&lt;/code> to summarize the data, which just counts the number of rows in each group.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># A dplyr workflow, starting by filtering with grepl(), grouping the data, then summarizing it&lt;/span>
forest_summary &lt;span style="color:#719e07">&amp;lt;-&lt;/span> dplyr&lt;span style="color:#719e07">::&lt;/span>&lt;span style="color:#268bd2">filter&lt;/span>(veg_dat, &lt;span style="color:#268bd2">grepl&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;forest&amp;#34;&lt;/span>, habitat, ignore.case &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#cb4b16">TRUE&lt;/span>)) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">group_by&lt;/span>(island, habitat) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">summarize&lt;/span>(obs &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">n&lt;/span>())
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## `summarise()` has grouped output by 'island'. You can override using the `.groups` argument.
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View summary table&lt;/span>
&lt;span style="color:#268bd2">print&lt;/span>(forest_summary)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 8 × 3
## # Groups: island [3]
## island habitat obs
## &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt;
## 1 Parramore Hardwood_forest_stands 21
## 2 Parramore Pine_forest_stands 27
## 3 Parramore Pine-Hardwood_forest_stands 31
## 4 Revel Pine-Hardwood_forest_stands 7
## 5 Smith Hardwood_forest_stands 44
## 6 Smith Pine_forest_stands 1
## 7 Smith Pine-hardwood_forest_stands 1
## 8 Smith Pine-Hardwood_forest_stands 40
&lt;/code>&lt;/pre>&lt;p>Cool! This is useful information to know, and it&amp;rsquo;s all thanks to &lt;code>grepl()&lt;/code> that we were able to perform this operation so easily.&lt;/p>
&lt;h2 id="find-and-replace-using-sub">Find and replace using &lt;code>sub()&lt;/code>&lt;/h2>
&lt;p>You may have noticed that the last two rows of the table show that Smith Island has 1 species in &amp;ldquo;Pine-&lt;strong>h&lt;/strong>ardwood_forest_stands&amp;rdquo;, and 40 species in &amp;ldquo;Pine-&lt;strong>H&lt;/strong>ardwood_forest_stands&amp;rdquo;. This is a typo that we need to fix — those two habitat types should be the same.&lt;/p>
&lt;p>No worries, we can use the &lt;code>sub()&lt;/code> function to replace all instances of &amp;ldquo;Pine-&lt;strong>h&lt;/strong>ardwood_forest_stands&amp;rdquo; with &amp;ldquo;Pine-&lt;strong>H&lt;/strong>ardwood_forest_stands&amp;rdquo;. The function works like this: &lt;code>sub(pattern_text, replacement_text, vector)&lt;/code>. We&amp;rsquo;re also going to tell the function &lt;code>ignore.case = F&lt;/code> because in this case, we care about the lowercase versus uppercase &amp;ldquo;H&amp;rdquo;.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Substitute all instances of &amp;#34;hardwood&amp;#34; with &amp;#34;Hardwood&amp;#34;&lt;/span>
veg_dat&lt;span style="color:#719e07">$&lt;/span>habitat &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">sub&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;hardwood&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Hardwood&amp;#34;&lt;/span>, veg_dat&lt;span style="color:#719e07">$&lt;/span>habitat, ignore.case &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">F&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Now if we summarize the data like we did above, all the pine-hardwood forests should be aggregated under the type &amp;ldquo;Pine-&lt;strong>H&lt;/strong>ardwood_forest_stands&amp;rdquo;.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># A dplyr workflow, starting by filtering with grepl(), grouping the data, then summarizing it&lt;/span>
forest_summary &lt;span style="color:#719e07">&amp;lt;-&lt;/span> dplyr&lt;span style="color:#719e07">::&lt;/span>&lt;span style="color:#268bd2">filter&lt;/span>(veg_dat, &lt;span style="color:#268bd2">grepl&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;forest&amp;#34;&lt;/span>, habitat, ignore.case &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#cb4b16">TRUE&lt;/span>)) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">group_by&lt;/span>(island, habitat) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">summarize&lt;/span>(obs &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">n&lt;/span>())
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## `summarise()` has grouped output by 'island'. You can override using the `.groups` argument.
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View summary table&lt;/span>
forest_summary
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 7 × 3
## # Groups: island [3]
## island habitat obs
## &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;int&amp;gt;
## 1 Parramore Hardwood_forest_stands 21
## 2 Parramore Pine_forest_stands 27
## 3 Parramore Pine-Hardwood_forest_stands 31
## 4 Revel Pine-Hardwood_forest_stands 7
## 5 Smith Hardwood_forest_stands 44
## 6 Smith Pine_forest_stands 1
## 7 Smith Pine-Hardwood_forest_stands 41
&lt;/code>&lt;/pre>&lt;p>Great! It looks like that fixed the issue.&lt;/p>
&lt;p>This was just one example of all the things you could do with &lt;code>grep()&lt;/code> and related functions. They&amp;rsquo;re extremely useful for organizing data and searching for the data you want.&lt;/p>
&lt;p>For further reading on strings and how to make your search queries with &lt;code>grep()&lt;/code> more specific, learn more about regex (regular expressions) here:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/regex" target="_blank" rel="noopener">https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/regex&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://r4ds.had.co.nz/strings.html" target="_blank" rel="noopener">https://r4ds.had.co.nz/strings.html&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://cran.r-project.org/web/packages/stringr/vignettes/regular-expressions.html" target="_blank" rel="noopener">https://cran.r-project.org/web/packages/stringr/vignettes/regular-expressions.html&lt;/a>&lt;/li>
&lt;/ul>
&lt;p>I hope you found this tutorial helpful! Happy coding!&lt;/p>
&lt;h3 id="data-set-citation">Data set citation:&lt;/h3>
&lt;p>McCaffrey, C. and R. Dueser. 2018. Vegetation Survey on the Virginia Barrier Islands - Species by habitat, 1974 ver 3. Environmental Data Initiative. &lt;a href="https://doi.org/10.6073/pasta/9c276fb0ce844030c4afae81ff2cadfb" target="_blank" rel="noopener">https://doi.org/10.6073/pasta/9c276fb0ce844030c4afae81ff2cadfb&lt;/a> (Accessed 2022-02-25).&lt;/p>
&lt;br>
&lt;hr>
&lt;center>If you enjoyed this tutorial and want learn more about searching and filtering your data, you can check out Luka Negoita's full course on the complete basics of R for ecology here:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://coaching.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning R now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>Learning about data structures in R</title><link>https://www.rforecology.com/post/data-structures-in-r/</link><pubDate>Wed, 23 Mar 2022 09:09:55 -0400</pubDate><guid>https://www.rforecology.com/post/data-structures-in-r/</guid><description>&lt;p>Last week, we posted a tutorial on the different types of data in R (&lt;a href="https://www.rforecology.com/post/data-types-in-r/" target="_blank" rel="noopener">check it out here&lt;/a>). In this tutorial, we&amp;rsquo;re going to talk about the different structures that R provides to help you organize your data.&lt;/p>
&lt;p>Data structures go hand-in-hand with data types, as both of these form the foundation for the work we do in R. You may have already worked with many of the structures that we describe in this blog post, but I wanted to take the time to describe them in depth and show you how they relate to or are different from one another.&lt;/p>
&lt;p>Let&amp;rsquo;s jump in!&lt;/p>
&lt;img src="https://www.rforecology.com/datastr_image1.png" alt="Image of file cabinets with text 'Data Structures in R'" style="width:400px;"/>
&lt;h2 id="the-different-data-structures">The different data structures&lt;/h2>
&lt;p>R provides several data structures that we commonly use as ecologists:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Vectors&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Lists&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Matrices&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Data frames&lt;/p>
&lt;p>4a. Tibbles&lt;/p>
&lt;/li>
&lt;/ol>
&lt;h2 id="vectors">Vectors&lt;/h2>
&lt;p>Vectors are one of the most common data structures. You can create a vector using the function &lt;code>c()&lt;/code>. &lt;code>c()&lt;/code> combines all of its arguments into a vector like so:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a vector&lt;/span>
vec &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;this&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;is&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;a&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;vector&amp;#34;&lt;/span>)
&lt;span style="color:#586e75"># View the vector&lt;/span>
vec
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;this&amp;quot; &amp;quot;is&amp;quot; &amp;quot;a&amp;quot; &amp;quot;vector&amp;quot;
&lt;/code>&lt;/pre>&lt;p>You can create a vector using any data type (numeric, character, logical, etc). However, if you combine data types in a vector, R will force all elements to be the same type. The type that R chooses for the vector will be the most &amp;ldquo;flexible&amp;rdquo; data type. Data types in order from least to greatest flexibility are: logical, integer, numeric, and character. For example, in the vector below, I combined numbers and characters into one vector.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a vector&lt;/span>
ex &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;species&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">10&lt;/span>)
&lt;span style="color:#586e75"># View vector&lt;/span>
&lt;span style="color:#268bd2">class&lt;/span>(ex)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;character&amp;quot;
&lt;/code>&lt;/pre>&lt;p>When we check the data type of the vector, it says character because we can change 1 and 10 to be &amp;ldquo;1&amp;rdquo; and &amp;ldquo;10&amp;rdquo;, but we can&amp;rsquo;t change &amp;ldquo;species&amp;rdquo; into a number. What number would &amp;ldquo;species&amp;rdquo; represent?? So here, R has chosen the more flexible data type — characters.&lt;/p>
&lt;p>You can also examine certain attributes of the vector such as &lt;code>length()&lt;/code> (i.e., number of elements) or, if you have a character vector, number of characters in each element (&lt;code>nchar()&lt;/code>).&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View vector&lt;/span>
ex
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;1&amp;quot; &amp;quot;species&amp;quot; &amp;quot;10&amp;quot;
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Length of vector&lt;/span>
&lt;span style="color:#268bd2">length&lt;/span>(ex)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 3
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Number of characters&lt;/span>
&lt;span style="color:#268bd2">nchar&lt;/span>(ex)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 1 7 2
&lt;/code>&lt;/pre>&lt;p>Vector elements can also be given names. You do this by assigning a character vector to &lt;code>names(my.vector)&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a vector&lt;/span>
crabs &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">10&lt;/span>, &lt;span style="color:#2aa198">15&lt;/span>, &lt;span style="color:#2aa198">26&lt;/span>)
&lt;span style="color:#586e75"># Give the vector names&lt;/span>
&lt;span style="color:#268bd2">names&lt;/span>(crabs) &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;Blue crab&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Mud crab&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Ghost crab&amp;#34;&lt;/span>)
&lt;span style="color:#586e75"># View named vector&lt;/span>
crabs
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## Blue crab Mud crab Ghost crab
## 10 15 26
&lt;/code>&lt;/pre>&lt;p>You can subset a vector by specifying the element number in square brackets. You could also subset a vector using the element name.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Choose element number 3&lt;/span>
crabs[3]
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## Ghost crab
## 26
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Choose element named &amp;#34;Ghost crab&amp;#34;&lt;/span>
crabs[&lt;span style="color:#2aa198">&amp;#34;Ghost crab&amp;#34;&lt;/span>]
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## Ghost crab
## 26
&lt;/code>&lt;/pre>&lt;p>Lastly, you can view the structure of a vector using the &lt;code>str()&lt;/code> function. This will tell us that the vector is a numeric vector with 3 elements: 10, 15, and 26. Below the vector, it also says that the attribute &lt;code>names&lt;/code> for the vector is a character vector with the elements &amp;ldquo;Blue crab&amp;rdquo;, &amp;ldquo;Mud crab&amp;rdquo;, and &amp;ldquo;Ghost crab&amp;rdquo;.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">str&lt;/span>(crabs)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## Named num [1:3] 10 15 26
## - attr(*, &amp;quot;names&amp;quot;)= chr [1:3] &amp;quot;Blue crab&amp;quot; &amp;quot;Mud crab&amp;quot; &amp;quot;Ghost crab&amp;quot;
&lt;/code>&lt;/pre>&lt;h2 id="lists">Lists&lt;/h2>
&lt;p>Lists are similar to vectors, but are unique in that their elements do not all have to be the same type, and they can also be lists — in other words, it allows you to have vectors nested within other vectors.&lt;/p>
&lt;p>To create a list, you use &lt;code>list()&lt;/code> instead of &lt;code>c()&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a list&lt;/span>
animals &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">list&lt;/span>(&lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;Eastern elliptio&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Diamondback terrapin&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Spring peeper&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;American eel&amp;#34;&lt;/span>),
&lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">25&lt;/span>, &lt;span style="color:#2aa198">3&lt;/span>, &lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#2aa198">10&lt;/span>),
&lt;span style="color:#2aa198">&amp;#34;Maryland&amp;#34;&lt;/span>,
&lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#268bd2">T&lt;/span>, &lt;span style="color:#268bd2">T&lt;/span>, &lt;span style="color:#268bd2">F&lt;/span>, &lt;span style="color:#268bd2">T&lt;/span>))
&lt;span style="color:#586e75"># View the structure of the list&lt;/span>
&lt;span style="color:#268bd2">str&lt;/span>(animals)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## List of 4
## $ : chr [1:4] &amp;quot;Eastern elliptio&amp;quot; &amp;quot;Diamondback terrapin&amp;quot; &amp;quot;Spring peeper&amp;quot; &amp;quot;American eel&amp;quot;
## $ : num [1:4] 25 3 0 10
## $ : chr &amp;quot;Maryland&amp;quot;
## $ : logi [1:4] TRUE TRUE FALSE TRUE
&lt;/code>&lt;/pre>&lt;p>Here, my list contains a vector of animal names (character), a vector of numbers (integer), the U.S. state that these animals can be found in (character), and a logical vector. The vectors don&amp;rsquo;t all need to be the same length — the third element has only one value, &amp;ldquo;Maryland&amp;rdquo;, while all the other elements have a length of 4.&lt;/p>
&lt;p>If we view the list, you&amp;rsquo;ll notice that each element is identified within double square brackets [[these]].&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View list&lt;/span>
animals
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [[1]]
## [1] &amp;quot;Eastern elliptio&amp;quot; &amp;quot;Diamondback terrapin&amp;quot; &amp;quot;Spring peeper&amp;quot;
## [4] &amp;quot;American eel&amp;quot;
##
## [[2]]
## [1] 25 3 0 10
##
## [[3]]
## [1] &amp;quot;Maryland&amp;quot;
##
## [[4]]
## [1] TRUE TRUE FALSE TRUE
&lt;/code>&lt;/pre>&lt;p>You can subset elements of a list using double square brackets, and further subset that list element using single square brackets.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View animal names (element 1 in the list)&lt;/span>
animals[[1]]
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;Eastern elliptio&amp;quot; &amp;quot;Diamondback terrapin&amp;quot; &amp;quot;Spring peeper&amp;quot;
## [4] &amp;quot;American eel&amp;quot;
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View the second animal name (element 2 of element 1 in the list)&lt;/span>
animals[[1]][2]
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;Diamondback terrapin&amp;quot;
&lt;/code>&lt;/pre>&lt;p>As with vectors, you can give list elements names. Let&amp;rsquo;s create the same list that we did above, but give it some more descriptive names by writing &lt;code>name.of.element = element&lt;/code> within the &lt;code>list()&lt;/code> function. In the code below, I named the list elements &amp;ldquo;common.name&amp;rdquo;, &amp;ldquo;abundance&amp;rdquo;, &amp;ldquo;state&amp;rdquo;, and &amp;ldquo;presence&amp;rdquo;.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a list&lt;/span>
animals &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">list&lt;/span>(common.name &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;Eastern elliptio&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Diamondback terrapin&amp;#34;&lt;/span>,
&lt;span style="color:#2aa198">&amp;#34;Spring peeper&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;American eel&amp;#34;&lt;/span>),
abundance &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">25&lt;/span>, &lt;span style="color:#2aa198">3&lt;/span>, &lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#2aa198">10&lt;/span>),
state &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;Maryland&amp;#34;&lt;/span>,
presence &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#268bd2">T&lt;/span>, &lt;span style="color:#268bd2">T&lt;/span>, &lt;span style="color:#268bd2">F&lt;/span>, &lt;span style="color:#268bd2">T&lt;/span>))
&lt;span style="color:#586e75"># View list&lt;/span>
animals
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## $common.name
## [1] &amp;quot;Eastern elliptio&amp;quot; &amp;quot;Diamondback terrapin&amp;quot; &amp;quot;Spring peeper&amp;quot;
## [4] &amp;quot;American eel&amp;quot;
##
## $abundance
## [1] 25 3 0 10
##
## $state
## [1] &amp;quot;Maryland&amp;quot;
##
## $presence
## [1] TRUE TRUE FALSE TRUE
&lt;/code>&lt;/pre>&lt;p>Now, instead of numbers inside of double square brackets, each element is identified by &lt;code>$name&lt;/code>. You can still subset the list using the element number in square brackets, like this: &lt;code>[[1]]&lt;/code>, but you can also subset the list using this dollar sign notation:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View whether the animals were present in our survey&lt;/span>
animals&lt;span style="color:#719e07">$&lt;/span>presence
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] TRUE TRUE FALSE TRUE
&lt;/code>&lt;/pre>&lt;p>Lists are really useful for storing lots of data, but it can get confusing if you have several lists nested in other lists. Naming your elements can help you keep things straight when subsetting your data.&lt;/p>
&lt;h2 id="matrices">Matrices&lt;/h2>
&lt;p>The next data structure I want to introduce is the matrix. Matrices are two-dimensional, rectangular objects that must contain elements of the same type, like a vector. These are most useful for mathematical operations, but are also common with species abundance/site data where column names are the species or sites and the rows are the other one. The cell values are the abundance of each species at every species x site combination — useful for multivariate analyses.&lt;/p>
&lt;p>You can create matrices using &lt;code>matrix(data = your.data, nrow = num.rows, ncol = num.cols, byrow = T/F, dimnames = your.names)&lt;/code>.&lt;/p>
&lt;p>&lt;code>data&lt;/code> accepts a vector of the data you want to use. &lt;code>nrow&lt;/code> is the number of rows you want in your matrix, while &lt;code>ncol&lt;/code> is the number of columns you want. The &lt;code>byrow&lt;/code> argument can be set to &lt;code>TRUE&lt;/code> or &lt;code>FALSE&lt;/code> depending on whether you want the matrix to fill your table by rows or by columns, though the default is &lt;code>FALSE&lt;/code>. &lt;code>dimnames&lt;/code> accepts a list of 2 elements that specifies names for the rows and columns of your matrix.&lt;/p>
&lt;p>The &lt;code>byrow&lt;/code> argument is best understood through demonstration:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a matrix that is filled by rows&lt;/span>
m1 &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">matrix&lt;/span>(data &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">12&lt;/span>, nrow &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">4&lt;/span>, ncol &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">3&lt;/span>, byrow &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">T&lt;/span>)
m1
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 4 5 6
## [3,] 7 8 9
## [4,] 10 11 12
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a matrix that is filled by columns&lt;/span>
m2 &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">matrix&lt;/span>(data &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">12&lt;/span>, nrow &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">4&lt;/span>, ncol &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">3&lt;/span>, byrow &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">F&lt;/span>)
m2
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [,1] [,2] [,3]
## [1,] 1 5 9
## [2,] 2 6 10
## [3,] 3 7 11
## [4,] 4 8 12
&lt;/code>&lt;/pre>&lt;p>You can see that the first bit of code fills in the table row by row — it fills it in from left to right, then moves down. The second chunk of code fills in the table by columns — it fills it in from top to bottom, then moves to the right.&lt;/p>
&lt;p>You can access matrix elements using single square brackets where the first number represents the row, while the second represents the column. So &lt;code>m1[2,3]&lt;/code> would access the element in the 2nd row and 3rd column. You could also type &lt;code>m1[2, ]&lt;/code>, leaving the column space blank. This will return the entire 2nd row of the matrix. Inversely, you could type &lt;code>m1[ , 3]&lt;/code>, which leaves the row space blank and returns the entire 3rd column of the matrix. Let&amp;rsquo;s see these in action.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Return element in 2nd row, 3rd column&lt;/span>
m1[2,&lt;span style="color:#2aa198">3&lt;/span>]
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 6
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Return 2nd row&lt;/span>
m1[2, ]
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 4 5 6
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Return 3rd column&lt;/span>
m1[ , &lt;span style="color:#2aa198">3&lt;/span>]
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 3 6 9 12
&lt;/code>&lt;/pre>&lt;p>We can also look at the number of rows and columns of a matrix by using &lt;code>nrow()&lt;/code> and &lt;code>ncol()&lt;/code>; these functions are analogous to the &lt;code>length()&lt;/code> function that we used for vectors. Alternatively, we can use &lt;code>dim()&lt;/code>, which will tell us both the number of rows and columns.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Number of rows&lt;/span>
&lt;span style="color:#268bd2">nrow&lt;/span>(m1)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 4
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Number of columns&lt;/span>
&lt;span style="color:#268bd2">ncol&lt;/span>(m1)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 3
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View matrix dimensions&lt;/span>
&lt;span style="color:#268bd2">dim&lt;/span>(m1)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 4 3
&lt;/code>&lt;/pre>&lt;h2 id="data-frames">Data frames&lt;/h2>
&lt;p>Data frames are the most common way to store and display tabular data in R and are the standard format for applying any analyses to your data. Like matrices, these are two-dimensional objects with rows and columns. But data frames are also like lists, in that you can have elements of several types within them. In fact, a data frame is a &lt;em>type&lt;/em> of list where each list element has the same length (this is what makes them rectangular / tabular).&lt;/p>
&lt;p>You have likely encountered data frames before, for example when importing data into R using functions such as &lt;code>read.csv()&lt;/code>.&lt;/p>
&lt;p>You can create a data frame using the function &lt;code>data.frame(col1 = vector1, col2 = vector2, etc.)&lt;/code>, where each vector should be the same length. You could also have a vector of length 1 or a length that is a divisor of the other vector lengths — this shorter vector will then get recycled until it reaches the length of the other columns.&lt;/p>
&lt;p>In the code below, I created a data frame of species, whether or not they were present, and their abundance. Each column consists of different data types. The 1st column is a character vector, the 2nd is logical, and the 3rd is numeric. This is really useful and allows us to store much more information than in a matrix.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a data frame&lt;/span>
species_dat &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">data.frame&lt;/span>(species &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;Callinectes sapidus&amp;#34;&lt;/span>,
&lt;span style="color:#2aa198">&amp;#34;Sciaenops ocellatus&amp;#34;&lt;/span>,
&lt;span style="color:#2aa198">&amp;#34;Anchoa mitchilli&amp;#34;&lt;/span>,
&lt;span style="color:#2aa198">&amp;#34;Micropognias undulatus&amp;#34;&lt;/span>,
&lt;span style="color:#2aa198">&amp;#34;Menidia menidia&amp;#34;&lt;/span>),
presence &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#268bd2">T&lt;/span>, &lt;span style="color:#268bd2">F&lt;/span>, &lt;span style="color:#268bd2">T&lt;/span>, &lt;span style="color:#268bd2">F&lt;/span>, &lt;span style="color:#268bd2">T&lt;/span>),
abundance &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">2&lt;/span>, &lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#2aa198">10&lt;/span>, &lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#2aa198">9&lt;/span>))
&lt;span style="color:#586e75"># View data frame&lt;/span>
species_dat
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## species presence abundance
## 1 Callinectes sapidus TRUE 2
## 2 Sciaenops ocellatus FALSE 0
## 3 Anchoa mitchilli TRUE 10
## 4 Micropognias undulatus FALSE 0
## 5 Menidia menidia TRUE 9
&lt;/code>&lt;/pre>&lt;p>You also have the option to add an argument &lt;code>row.names = c(&amp;quot;vector&amp;quot;, &amp;quot;of&amp;quot;, &amp;quot;names&amp;quot;, &amp;quot;for&amp;quot;, &amp;quot;rows&amp;quot;)&lt;/code>, though adding row.names is less common for data frames.&lt;/p>
&lt;p>As with matrices, you can view number of rows and columns using &lt;code>nrow(my.dataframe)&lt;/code> or &lt;code>ncol(my.dataframe)&lt;/code>, or use &lt;code>dim(my.dataframe)&lt;/code> to view the full dimensions.&lt;/p>
&lt;p>And like matrices, you can subset your data frame into its rows or columns using single square brackets: &lt;code>my.dataframe[row.num, col.num]&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View the third item in the first column&lt;/span>
species_dat[3, &lt;span style="color:#2aa198">1&lt;/span>]
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;Anchoa mitchilli&amp;quot;
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View the first column&lt;/span>
species_dat[ , &lt;span style="color:#2aa198">1&lt;/span>]
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;Callinectes sapidus&amp;quot; &amp;quot;Sciaenops ocellatus&amp;quot; &amp;quot;Anchoa mitchilli&amp;quot;
## [4] &amp;quot;Micropognias undulatus&amp;quot; &amp;quot;Menidia menidia&amp;quot;
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View the third row&lt;/span>
species_dat[3, ]
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## species presence abundance
## 3 Anchoa mitchilli TRUE 10
&lt;/code>&lt;/pre>&lt;p>Alternatively, you can subset your data frame in the same way as lists, by using the dollar sign symbol or double square brackets. Each column is essentially a list element, so you can easily choose a data frame column using &lt;code>my.dataframe$col.name&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View the abundance column in three different ways&lt;/span>
species_dat&lt;span style="color:#719e07">$&lt;/span>abundance
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 2 0 10 0 9
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">species_dat[[3]]
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 2 0 10 0 9
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">species_dat[[&lt;span style="color:#2aa198">&amp;#34;abundance&amp;#34;&lt;/span>]]
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 2 0 10 0 9
&lt;/code>&lt;/pre>&lt;p>The function &lt;code>str()&lt;/code> is also useful. It shows you the structure of your data frame. This will tell you the number of rows and columns in your data frame and will tell you the data types of each column.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View structure&lt;/span>
&lt;span style="color:#268bd2">str&lt;/span>(species_dat)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## 'data.frame': 5 obs. of 3 variables:
## $ species : chr &amp;quot;Callinectes sapidus&amp;quot; &amp;quot;Sciaenops ocellatus&amp;quot; &amp;quot;Anchoa mitchilli&amp;quot; &amp;quot;Micropognias undulatus&amp;quot; ...
## $ presence : logi TRUE FALSE TRUE FALSE TRUE
## $ abundance: num 2 0 10 0 9
&lt;/code>&lt;/pre>&lt;div class="alert alert-note">
&lt;div>
&lt;p>These are a few functions that are very useful for getting to know your data frames.&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>head()&lt;/code> or &lt;code>tail()&lt;/code> to view the first 6 or last 6 rows of your data frame&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>dim()&lt;/code>, &lt;code>nrow()&lt;/code>, or &lt;code>ncol()&lt;/code> to view the number of rows or columns (or both!) of your data frame&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>rownames()&lt;/code> or &lt;code>colnames()&lt;/code> to view or set the row or column names of your data frame. Note that just &lt;code>names()&lt;/code> will also give you the column names of a data frame.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>str()&lt;/code> to view the structure of your data frame&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/div>
&lt;/div>
&lt;p>As you can see, data frames are very useful for organizing complex, multi-attribute data sets that contain data of different types. No wonder we use them so often!&lt;/p>
&lt;h3 id="tibbles">Tibbles&lt;/h3>
&lt;p>I added in tibbles as a side data structure — even though it isn&amp;rsquo;t an official data structure in R, it&amp;rsquo;s something that comes up often if you use the &lt;code>tidyverse&lt;/code> set of packages. Tibbles come with the &lt;code>tibble&lt;/code> package (which comes with the tidyverse) and are basically data frames with a few added benefits!&lt;/p>
&lt;p>Functionally, tibbles are the same as data frames when you manipulate them. They can do everything that data frames can do, but they have slightly different properties that make them more convenient. In fact, &amp;lsquo;tibble&amp;rsquo; stands for &amp;lsquo;tidy table&amp;rsquo; :) Let&amp;rsquo;s find out what makes tibbles different.&lt;/p>
&lt;p>First, let&amp;rsquo;s load up the &lt;code>tidyverse&lt;/code> set of packages.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">library&lt;/span>(tidyverse)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>To create a tibble, all you have to do is use the function &lt;code>tibble()&lt;/code>, which works the same way as the function &lt;code>data.frame()&lt;/code>. When you&amp;rsquo;re creating a tibble, you can only use vectors that are either all the same length, or have length of 1. The vector with a length of 1 will just be recycled until it fills all of the rows in its column. Tibbles also don&amp;rsquo;t use &lt;code>row.names()&lt;/code>, which keeps things simpler.&lt;/p>
&lt;p>Let&amp;rsquo;s create the same species table that we did earlier, but this time as a tibble.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a tibble&lt;/span>
species_dat &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">tibble&lt;/span>(species &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;Callinectes sapidus&amp;#34;&lt;/span>,
&lt;span style="color:#2aa198">&amp;#34;Sciaenops ocellatus&amp;#34;&lt;/span>,
&lt;span style="color:#2aa198">&amp;#34;Anchoa mitchilli&amp;#34;&lt;/span>,
&lt;span style="color:#2aa198">&amp;#34;Micropognias undulatus&amp;#34;&lt;/span>,
&lt;span style="color:#2aa198">&amp;#34;Menidia menidia&amp;#34;&lt;/span>),
presence &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#268bd2">T&lt;/span>, &lt;span style="color:#268bd2">F&lt;/span>, &lt;span style="color:#268bd2">T&lt;/span>, &lt;span style="color:#268bd2">F&lt;/span>, &lt;span style="color:#268bd2">T&lt;/span>),
abundance &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">2&lt;/span>, &lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#2aa198">10&lt;/span>, &lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#2aa198">9&lt;/span>))
&lt;span style="color:#586e75"># View the tibble and the class&lt;/span>
species_dat
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 5 × 3
## species presence abundance
## &amp;lt;chr&amp;gt; &amp;lt;lgl&amp;gt; &amp;lt;dbl&amp;gt;
## 1 Callinectes sapidus TRUE 2
## 2 Sciaenops ocellatus FALSE 0
## 3 Anchoa mitchilli TRUE 10
## 4 Micropognias undulatus FALSE 0
## 5 Menidia menidia TRUE 9
&lt;/code>&lt;/pre>&lt;p>When we print the tibble, it clearly tells us that it&amp;rsquo;s a tibble. It also tells us the table dimensions and the column names and data types.&lt;/p>
&lt;p>You might be thinking: okay&amp;hellip;and? The tibble doesn&amp;rsquo;t look that different from the data frame we originally created.&lt;/p>
&lt;p>Let&amp;rsquo;s try another example.&lt;/p>
&lt;p>This time, let&amp;rsquo;s load up an example data set that comes with the &lt;code>ggplot2&lt;/code> package. This data set is called &lt;code>msleep&lt;/code>, and describes the sleep times and brain weights of several different types of mammals. This data set already comes as a tibble, so let&amp;rsquo;s turn it into a data frame for the purposes of demonstration, using the &lt;code>as.data.frame()&lt;/code> function.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Load data&lt;/span>
&lt;span style="color:#268bd2">data&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;msleep&amp;#34;&lt;/span>)
&lt;span style="color:#586e75"># Turn data into class data frame&lt;/span>
msleep &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">as.data.frame&lt;/span>(msleep)
&lt;span style="color:#586e75"># View data&lt;/span>
msleep
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code class="language-{style="max-height:" data-lang="{style="max-height:">## name genus vore
## 1 Cheetah Acinonyx carni
## 2 Owl monkey Aotus omni
## 3 Mountain beaver Aplodontia herbi
## 4 Greater short-tailed shrew Blarina omni
## 5 Cow Bos herbi
## 6 Three-toed sloth Bradypus herbi
## 7 Northern fur seal Callorhinus carni
## 8 Vesper mouse Calomys &amp;lt;NA&amp;gt;
## 9 Dog Canis carni
## 10 Roe deer Capreolus herbi
## 11 Goat Capri herbi
## 12 Guinea pig Cavis herbi
## 13 Grivet Cercopithecus omni
## 14 Chinchilla Chinchilla herbi
## 15 Star-nosed mole Condylura omni
## 16 African giant pouched rat Cricetomys omni
## 17 Lesser short-tailed shrew Cryptotis omni
## 18 Long-nosed armadillo Dasypus carni
## 19 Tree hyrax Dendrohyrax herbi
## 20 North American Opossum Didelphis omni
## 21 Asian elephant Elephas herbi
## 22 Big brown bat Eptesicus insecti
## 23 Horse Equus herbi
## 24 Donkey Equus herbi
## 25 European hedgehog Erinaceus omni
## 26 Patas monkey Erythrocebus omni
## 27 Western american chipmunk Eutamias herbi
## 28 Domestic cat Felis carni
## 29 Galago Galago omni
## 30 Giraffe Giraffa herbi
## 31 Pilot whale Globicephalus carni
## 32 Gray seal Haliochoerus carni
## 33 Gray hyrax Heterohyrax herbi
## 34 Human Homo omni
## 35 Mongoose lemur Lemur herbi
## 36 African elephant Loxodonta herbi
## 37 Thick-tailed opposum Lutreolina carni
## 38 Macaque Macaca omni
## 39 Mongolian gerbil Meriones herbi
## 40 Golden hamster Mesocricetus herbi
## 41 Vole Microtus herbi
## 42 House mouse Mus herbi
## 43 Little brown bat Myotis insecti
## 44 Round-tailed muskrat Neofiber herbi
## 45 Slow loris Nyctibeus carni
## 46 Degu Octodon herbi
## 47 Northern grasshopper mouse Onychomys carni
## 48 Rabbit Oryctolagus herbi
## 49 Sheep Ovis herbi
## 50 Chimpanzee Pan omni
## 51 Tiger Panthera carni
## 52 Jaguar Panthera carni
## 53 Lion Panthera carni
## 54 Baboon Papio omni
## 55 Desert hedgehog Paraechinus &amp;lt;NA&amp;gt;
## 56 Potto Perodicticus omni
## 57 Deer mouse Peromyscus &amp;lt;NA&amp;gt;
## 58 Phalanger Phalanger &amp;lt;NA&amp;gt;
## 59 Caspian seal Phoca carni
## 60 Common porpoise Phocoena carni
## 61 Potoroo Potorous herbi
## 62 Giant armadillo Priodontes insecti
## 63 Rock hyrax Procavia &amp;lt;NA&amp;gt;
## 64 Laboratory rat Rattus herbi
## 65 African striped mouse Rhabdomys omni
## 66 Squirrel monkey Saimiri omni
## 67 Eastern american mole Scalopus insecti
## 68 Cotton rat Sigmodon herbi
## 69 Mole rat Spalax &amp;lt;NA&amp;gt;
## 70 Arctic ground squirrel Spermophilus herbi
## 71 Thirteen-lined ground squirrel Spermophilus herbi
## 72 Golden-mantled ground squirrel Spermophilus herbi
## 73 Musk shrew Suncus &amp;lt;NA&amp;gt;
## 74 Pig Sus omni
## 75 Short-nosed echidna Tachyglossus insecti
## 76 Eastern american chipmunk Tamias herbi
## 77 Brazilian tapir Tapirus herbi
## 78 Tenrec Tenrec omni
## 79 Tree shrew Tupaia omni
## 80 Bottle-nosed dolphin Tursiops carni
## 81 Genet Genetta carni
## 82 Arctic fox Vulpes carni
## 83 Red fox Vulpes carni
## order conservation sleep_total sleep_rem
## 1 Carnivora lc 12.1 NA
## 2 Primates &amp;lt;NA&amp;gt; 17.0 1.8
## 3 Rodentia nt 14.4 2.4
## 4 Soricomorpha lc 14.9 2.3
## 5 Artiodactyla domesticated 4.0 0.7
## 6 Pilosa &amp;lt;NA&amp;gt; 14.4 2.2
## 7 Carnivora vu 8.7 1.4
## 8 Rodentia &amp;lt;NA&amp;gt; 7.0 NA
## 9 Carnivora domesticated 10.1 2.9
## 10 Artiodactyla lc 3.0 NA
## 11 Artiodactyla lc 5.3 0.6
## 12 Rodentia domesticated 9.4 0.8
## 13 Primates lc 10.0 0.7
## 14 Rodentia domesticated 12.5 1.5
## 15 Soricomorpha lc 10.3 2.2
## 16 Rodentia &amp;lt;NA&amp;gt; 8.3 2.0
## 17 Soricomorpha lc 9.1 1.4
## 18 Cingulata lc 17.4 3.1
## 19 Hyracoidea lc 5.3 0.5
## 20 Didelphimorphia lc 18.0 4.9
## 21 Proboscidea en 3.9 NA
## 22 Chiroptera lc 19.7 3.9
## 23 Perissodactyla domesticated 2.9 0.6
## 24 Perissodactyla domesticated 3.1 0.4
## 25 Erinaceomorpha lc 10.1 3.5
## 26 Primates lc 10.9 1.1
## 27 Rodentia &amp;lt;NA&amp;gt; 14.9 NA
## 28 Carnivora domesticated 12.5 3.2
## 29 Primates &amp;lt;NA&amp;gt; 9.8 1.1
## 30 Artiodactyla cd 1.9 0.4
## 31 Cetacea cd 2.7 0.1
## 32 Carnivora lc 6.2 1.5
## 33 Hyracoidea lc 6.3 0.6
## 34 Primates &amp;lt;NA&amp;gt; 8.0 1.9
## 35 Primates vu 9.5 0.9
## 36 Proboscidea vu 3.3 NA
## 37 Didelphimorphia lc 19.4 6.6
## 38 Primates &amp;lt;NA&amp;gt; 10.1 1.2
## 39 Rodentia lc 14.2 1.9
## 40 Rodentia en 14.3 3.1
## 41 Rodentia &amp;lt;NA&amp;gt; 12.8 NA
## 42 Rodentia nt 12.5 1.4
## 43 Chiroptera &amp;lt;NA&amp;gt; 19.9 2.0
## 44 Rodentia nt 14.6 NA
## 45 Primates &amp;lt;NA&amp;gt; 11.0 NA
## 46 Rodentia lc 7.7 0.9
## 47 Rodentia lc 14.5 NA
## 48 Lagomorpha domesticated 8.4 0.9
## 49 Artiodactyla domesticated 3.8 0.6
## 50 Primates &amp;lt;NA&amp;gt; 9.7 1.4
## 51 Carnivora en 15.8 NA
## 52 Carnivora nt 10.4 NA
## 53 Carnivora vu 13.5 NA
## 54 Primates &amp;lt;NA&amp;gt; 9.4 1.0
## 55 Erinaceomorpha lc 10.3 2.7
## 56 Primates lc 11.0 NA
## 57 Rodentia &amp;lt;NA&amp;gt; 11.5 NA
## 58 Diprotodontia &amp;lt;NA&amp;gt; 13.7 1.8
## 59 Carnivora vu 3.5 0.4
## 60 Cetacea vu 5.6 NA
## 61 Diprotodontia &amp;lt;NA&amp;gt; 11.1 1.5
## 62 Cingulata en 18.1 6.1
## 63 Hyracoidea lc 5.4 0.5
## 64 Rodentia lc 13.0 2.4
## 65 Rodentia &amp;lt;NA&amp;gt; 8.7 NA
## 66 Primates &amp;lt;NA&amp;gt; 9.6 1.4
## 67 Soricomorpha lc 8.4 2.1
## 68 Rodentia &amp;lt;NA&amp;gt; 11.3 1.1
## 69 Rodentia &amp;lt;NA&amp;gt; 10.6 2.4
## 70 Rodentia lc 16.6 NA
## 71 Rodentia lc 13.8 3.4
## 72 Rodentia lc 15.9 3.0
## 73 Soricomorpha &amp;lt;NA&amp;gt; 12.8 2.0
## 74 Artiodactyla domesticated 9.1 2.4
## 75 Monotremata &amp;lt;NA&amp;gt; 8.6 NA
## 76 Rodentia &amp;lt;NA&amp;gt; 15.8 NA
## 77 Perissodactyla vu 4.4 1.0
## 78 Afrosoricida &amp;lt;NA&amp;gt; 15.6 2.3
## 79 Scandentia &amp;lt;NA&amp;gt; 8.9 2.6
## 80 Cetacea &amp;lt;NA&amp;gt; 5.2 NA
## 81 Carnivora &amp;lt;NA&amp;gt; 6.3 1.3
## 82 Carnivora &amp;lt;NA&amp;gt; 12.5 NA
## 83 Carnivora &amp;lt;NA&amp;gt; 9.8 2.4
## sleep_cycle awake brainwt bodywt
## 1 NA 11.90 NA 50.000
## 2 NA 7.00 0.01550 0.480
## 3 NA 9.60 NA 1.350
## 4 0.1333333 9.10 0.00029 0.019
## 5 0.6666667 20.00 0.42300 600.000
## 6 0.7666667 9.60 NA 3.850
## 7 0.3833333 15.30 NA 20.490
## 8 NA 17.00 NA 0.045
## 9 0.3333333 13.90 0.07000 14.000
## 10 NA 21.00 0.09820 14.800
## 11 NA 18.70 0.11500 33.500
## 12 0.2166667 14.60 0.00550 0.728
## 13 NA 14.00 NA 4.750
## 14 0.1166667 11.50 0.00640 0.420
## 15 NA 13.70 0.00100 0.060
## 16 NA 15.70 0.00660 1.000
## 17 0.1500000 14.90 0.00014 0.005
## 18 0.3833333 6.60 0.01080 3.500
## 19 NA 18.70 0.01230 2.950
## 20 0.3333333 6.00 0.00630 1.700
## 21 NA 20.10 4.60300 2547.000
## 22 0.1166667 4.30 0.00030 0.023
## 23 1.0000000 21.10 0.65500 521.000
## 24 NA 20.90 0.41900 187.000
## 25 0.2833333 13.90 0.00350 0.770
## 26 NA 13.10 0.11500 10.000
## 27 NA 9.10 NA 0.071
## 28 0.4166667 11.50 0.02560 3.300
## 29 0.5500000 14.20 0.00500 0.200
## 30 NA 22.10 NA 899.995
## 31 NA 21.35 NA 800.000
## 32 NA 17.80 0.32500 85.000
## 33 NA 17.70 0.01227 2.625
## 34 1.5000000 16.00 1.32000 62.000
## 35 NA 14.50 NA 1.670
## 36 NA 20.70 5.71200 6654.000
## 37 NA 4.60 NA 0.370
## 38 0.7500000 13.90 0.17900 6.800
## 39 NA 9.80 NA 0.053
## 40 0.2000000 9.70 0.00100 0.120
## 41 NA 11.20 NA 0.035
## 42 0.1833333 11.50 0.00040 0.022
## 43 0.2000000 4.10 0.00025 0.010
## 44 NA 9.40 NA 0.266
## 45 NA 13.00 0.01250 1.400
## 46 NA 16.30 NA 0.210
## 47 NA 9.50 NA 0.028
## 48 0.4166667 15.60 0.01210 2.500
## 49 NA 20.20 0.17500 55.500
## 50 1.4166667 14.30 0.44000 52.200
## 51 NA 8.20 NA 162.564
## 52 NA 13.60 0.15700 100.000
## 53 NA 10.50 NA 161.499
## 54 0.6666667 14.60 0.18000 25.235
## 55 NA 13.70 0.00240 0.550
## 56 NA 13.00 NA 1.100
## 57 NA 12.50 NA 0.021
## 58 NA 10.30 0.01140 1.620
## 59 NA 20.50 NA 86.000
## 60 NA 18.45 NA 53.180
## 61 NA 12.90 NA 1.100
## 62 NA 5.90 0.08100 60.000
## 63 NA 18.60 0.02100 3.600
## 64 0.1833333 11.00 0.00190 0.320
## 65 NA 15.30 NA 0.044
## 66 NA 14.40 0.02000 0.743
## 67 0.1666667 15.60 0.00120 0.075
## 68 0.1500000 12.70 0.00118 0.148
## 69 NA 13.40 0.00300 0.122
## 70 NA 7.40 0.00570 0.920
## 71 0.2166667 10.20 0.00400 0.101
## 72 NA 8.10 NA 0.205
## 73 0.1833333 11.20 0.00033 0.048
## 74 0.5000000 14.90 0.18000 86.250
## 75 NA 15.40 0.02500 4.500
## 76 NA 8.20 NA 0.112
## 77 0.9000000 19.60 0.16900 207.501
## 78 NA 8.40 0.00260 0.900
## 79 0.2333333 15.10 0.00250 0.104
## 80 NA 18.80 NA 173.330
## 81 NA 17.70 0.01750 2.000
## 82 NA 11.50 0.04450 3.380
## 83 0.3500000 14.20 0.05040 4.230
&lt;/code>&lt;/pre>&lt;p>Okay, wow. When we print the data frame it&amp;rsquo;s pretty overwhelming. Printing the data frame shows us all of our rows and columns. And because our columns don&amp;rsquo;t all fit on one row, they have to be carried over and added as extra rows, making the printed output even longer. This is a very messy and confusing way to view our data.&lt;/p>
&lt;p>Let&amp;rsquo;s turn the data back into a tibble using the &lt;code>as_tibble()&lt;/code> function, and let&amp;rsquo;s see what that looks like.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Turn data into a tibble&lt;/span>
msleep &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">as_tibble&lt;/span>(msleep)
&lt;span style="color:#586e75"># View data&lt;/span>
msleep
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 83 × 11
## name genus vore order conservation sleep_total
## &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt;
## 1 Cheetah Acinon… carni Carni… lc 12.1
## 2 Owl monkey Aotus omni Prima… &amp;lt;NA&amp;gt; 17
## 3 Mountain b… Aplodo… herbi Roden… nt 14.4
## 4 Greater sh… Blarina omni Soric… lc 14.9
## 5 Cow Bos herbi Artio… domesticated 4
## 6 Three-toed… Bradyp… herbi Pilosa &amp;lt;NA&amp;gt; 14.4
## 7 Northern f… Callor… carni Carni… vu 8.7
## 8 Vesper mou… Calomys &amp;lt;NA&amp;gt; Roden… &amp;lt;NA&amp;gt; 7
## 9 Dog Canis carni Carni… domesticated 10.1
## 10 Roe deer Capreo… herbi Artio… lc 3
## # … with 73 more rows, and 5 more variables:
## # sleep_rem &amp;lt;dbl&amp;gt;, sleep_cycle &amp;lt;dbl&amp;gt;, awake &amp;lt;dbl&amp;gt;,
## # brainwt &amp;lt;dbl&amp;gt;, bodywt &amp;lt;dbl&amp;gt;
&lt;/code>&lt;/pre>&lt;p>The printed tibble is much neater than the printed data frame! Although there are ways to print data frames more neatly, tibbles are automatically formatted so that the columns are abbreviated to fit on one row (or are not printed), and you only see the first ten rows of data instead of every single row. This makes it way more convenient to view your data sets.&lt;/p>
&lt;p>Tibbles also reduce errors when subsetting your data. For example, when subsetting with single square brackets [ ], tibbles always return another tibble. In contrast, subsetting data frames will sometimes return a vector instead of another data frame.&lt;/p>
&lt;p>And if you try to subset a tibble using a column that does not exist, you&amp;rsquo;ll receive a warning that the column does not exist. In contrast, subsetting a data frame using a column that doesn&amp;rsquo;t exist will only return &lt;code>NULL&lt;/code>, and you don&amp;rsquo;t receive an explanation of why.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># See if msleep (the tibble) has a column called &amp;#34;abc&amp;#34;&lt;/span>
msleep&lt;span style="color:#719e07">$&lt;/span>abc
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## Warning: Unknown or uninitialised column: `abc`.
&lt;/code>&lt;/pre>&lt;pre>&lt;code>## NULL
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Turn msleep into a data frame&lt;/span>
msleep &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">as.data.frame&lt;/span>(msleep)
&lt;span style="color:#586e75"># See if msleep (the data frame) has a column called &amp;#34;abc&amp;#34;&lt;/span>
msleep&lt;span style="color:#719e07">$&lt;/span>abc
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## NULL
&lt;/code>&lt;/pre>&lt;p>One other advantage to tibbles is that they allow your column names to have spaces. Normally you wouldn&amp;rsquo;t go out of your way to add spaces to your column names since it&amp;rsquo;s much better practice to use underscores &amp;ldquo;_&amp;rdquo; in place of spaces to begin with. However, sometimes the data you upload into R will contain spaces in the column names. While regular data frames replace spaces with periods &amp;ldquo;.&amp;rdquo;, tibbles maintain the original column names surrounded by back ticks (also known as the acute or left quote, it&amp;rsquo;s the apostrophe-like thing usually located above your left tab key and with the tilde &amp;lsquo;~&amp;rsquo; on your keyboard). When uploading data into R, you can upload directly as a tibble and ensure all column names are maintained as they were in the original CSV by using &lt;code>read_csv()&lt;/code> (note the underscore between &amp;lsquo;read&amp;rsquo; and &amp;lsquo;csv&amp;rsquo; versus of the function &amp;ldquo;read.csv()&amp;rdquo;, which reads in your data as a data frame).&lt;/p>
&lt;p>In short, tibbles make a number of changes to normal data frames that can help reduce errors in your data analysis. These improvements in printing and subsetting are small, but useful!&lt;/p>
&lt;p>And that&amp;rsquo;s it for our blog post on data structures in R! I hope this post taught you a few useful tips and tricks for working with your data. Happy coding!&lt;/p>
&lt;br>
&lt;hr>
&lt;center>If you enjoyed this tutorial and want learn more about data frames and tibbles, and how to use them, you can check out Luka Negoita's full course on the complete basics of R for ecology here:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://coaching.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning R now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>R Data types 101, or What kind of data do I have?</title><link>https://www.rforecology.com/post/data-types-in-r/</link><pubDate>Wed, 16 Mar 2022 09:45:39 -0400</pubDate><guid>https://www.rforecology.com/post/data-types-in-r/</guid><description>&lt;script src="https://rforecology.activehosted.com/f/embed.php?id=21" type="text/javascript" charset="utf-8">&lt;/script>
&lt;p>Most of us are pretty familiar with data types in our daily lives — we can easily tell that things like 1, 2, 3, and 4 are numbers (in this case, integers). 15.7 is still a number, but has a decimal. We know that every single word I&amp;rsquo;m typing in this sentence is composed of characters, and we know that in math, &amp;ldquo;true&amp;rdquo; and &amp;ldquo;false&amp;rdquo; are the answers to logical statements.&lt;/p>
&lt;p>Just as we do in our heads, R also categorizes our data into different classes. These categories are similar to the real-life ones I described above, but can be a little different in terms of syntax and things to watch out for in your code.&lt;/p>
&lt;p>To work in R and perform data analyses, you&amp;rsquo;ll need to have a solid understanding of data types. In this tutorial, I&amp;rsquo;m going to introduce several different types of data, explain how to use and manipulate each of them, and show you how to check what type of data you have. Let&amp;rsquo;s dive in.&lt;/p>
&lt;img src="https://www.rforecology.com/datatypes_image1.png" alt="Image of a data table with individuals, their heights, and their sex listed, with arrows pointing to their respective data types which are character, numeric, and factor. The title says 'Data types in R...what does it all mean?'" style="width:400px;"/>
&lt;h2 id="types-of-data">Types of data&lt;/h2>
&lt;p>There are five main types of data in R that you&amp;rsquo;d come across as an ecologist. I&amp;rsquo;ll discuss all of them below except complex numbers, which are rarely used for data analysis in R.&lt;/p>
&lt;ol>
&lt;li>
&lt;p>&lt;strong>Numeric&lt;/strong> (&lt;code>1.2, 5, 7, 3.14159&lt;/code>)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Integer&lt;/strong> (&lt;code>1, 2, 3, 4, 5&lt;/code>)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Complex&lt;/strong> (&lt;code>i + 4&lt;/code>)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Logical&lt;/strong> (&lt;code>TRUE / FALSE&lt;/code>)&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Character&lt;/strong> (&lt;code>&amp;quot;a&amp;quot;, &amp;quot;apple&amp;quot;&lt;/code>)&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>I&amp;rsquo;m also going to discuss a sixth, related category that helps you work with categorical variables:&lt;/p>
&lt;ol start="6">
&lt;li>&lt;strong>Factor&lt;/strong>&lt;/li>
&lt;/ol>
&lt;h3 id="numeric">Numeric&lt;/h3>
&lt;p>Numeric data types are pretty straightforward. These are just numbers, written as either integers or decimals. We can check if our vector is numeric by using the function &lt;code>is.numeric()&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a numeric vector&lt;/span>
x &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">3&lt;/span>, &lt;span style="color:#2aa198">5&lt;/span>, &lt;span style="color:#2aa198">6&lt;/span>, &lt;span style="color:#2aa198">10.7&lt;/span>)
&lt;span style="color:#586e75"># Is our vector numeric? Yes!&lt;/span>
&lt;span style="color:#268bd2">is.numeric&lt;/span>(x)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] TRUE
&lt;/code>&lt;/pre>&lt;p>We can check our data type by using the functions &lt;code>class()&lt;/code> and &lt;code>typeof()&lt;/code>. &lt;code>class()&lt;/code> tells us that we&amp;rsquo;re working with numeric values, while &lt;code>typeof()&lt;/code> is more specific and tells us we&amp;rsquo;re working with doubles (i.e., numbers with decimals).&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Check the type of data class we have&lt;/span>
&lt;span style="color:#268bd2">class&lt;/span>(x)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;numeric&amp;quot;
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Check the specific type of data that you have&lt;/span>
&lt;span style="color:#268bd2">typeof&lt;/span>(x)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;double&amp;quot;
&lt;/code>&lt;/pre>&lt;p>You can, of course, perform mathematical operations with numeric values.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Add 4 to all the values in the vector&lt;/span>
x &lt;span style="color:#719e07">+&lt;/span> &lt;span style="color:#2aa198">4&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 7.0 9.0 10.0 14.7
&lt;/code>&lt;/pre>&lt;h3 id="integer">Integer&lt;/h3>
&lt;p>You can also do math with integers, which represent numbers without decimal places. These are usually used if you&amp;rsquo;re counting something — for example, you can observe 7 butterflies in a plot, but you can&amp;rsquo;t observe 7.2 butterflies (or at least I hope not!).&lt;/p>
&lt;p>If you create a vector manually and don&amp;rsquo;t have any decimal values, R will still identify your vector as the class &amp;ldquo;numeric&amp;rdquo;.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a vector with only integers&lt;/span>
x &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1&lt;/span>, &lt;span style="color:#2aa198">4&lt;/span>, &lt;span style="color:#2aa198">2&lt;/span>, &lt;span style="color:#2aa198">7&lt;/span>, &lt;span style="color:#2aa198">8&lt;/span>)
&lt;span style="color:#586e75"># Look at the class&lt;/span>
&lt;span style="color:#268bd2">class&lt;/span>(x)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;numeric&amp;quot;
&lt;/code>&lt;/pre>&lt;p>You can change this vector to be an integer by using the function &lt;code>as.integer()&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Change the vector class&lt;/span>
x &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">as.integer&lt;/span>(x)
&lt;span style="color:#586e75"># Look at the class&lt;/span>
&lt;span style="color:#268bd2">class&lt;/span>(x)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;integer&amp;quot;
&lt;/code>&lt;/pre>&lt;p>Alternatively, you can generate an integer vector like this. The &amp;ldquo;L&amp;rdquo; after each number tells R that you want it to be an integer.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create an integer vector&lt;/span>
x &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1L&lt;/span>, &lt;span style="color:#2aa198">2L&lt;/span>, &lt;span style="color:#2aa198">5L&lt;/span>, &lt;span style="color:#2aa198">3L&lt;/span>, &lt;span style="color:#2aa198">10L&lt;/span>)
&lt;span style="color:#586e75"># View vector&lt;/span>
x
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 1 2 5 3 10
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View class&lt;/span>
&lt;span style="color:#268bd2">class&lt;/span>(x)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;integer&amp;quot;
&lt;/code>&lt;/pre>&lt;p>You could also create an integer vector like this. The colon (&lt;code>:&lt;/code>) tells R to generate a sequence of vectors from 1 to 10, going up by 1 each time.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a sequence of integers&lt;/span>
x &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">10&lt;/span>)
&lt;span style="color:#586e75"># View vector&lt;/span>
x
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 1 2 3 4 5 6 7 8 9 10
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View data class&lt;/span>
&lt;span style="color:#268bd2">class&lt;/span>(x)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;integer&amp;quot;
&lt;/code>&lt;/pre>&lt;p>Some functions will also automatically generate integer vectors, like the function &lt;code>sample()&lt;/code>. This function randomly samples a certain number of integer values within a specified range. I asked &lt;code>sample()&lt;/code> to choose ten values between 1 and 10.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a random sequence of integers from 1 to 10:&lt;/span>
&lt;span style="color:#268bd2">set.seed&lt;/span>(&lt;span style="color:#2aa198">123&lt;/span>) &lt;span style="color:#586e75"># use set.seed to get the same random values as me&lt;/span>
x &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">sample&lt;/span>(&lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">10&lt;/span>, &lt;span style="color:#2aa198">10&lt;/span>)
&lt;span style="color:#586e75"># View vector&lt;/span>
x
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 3 10 2 8 6 9 1 7 5 4
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View data class&lt;/span>
&lt;span style="color:#268bd2">class&lt;/span>(x)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;integer&amp;quot;
&lt;/code>&lt;/pre>&lt;h3 id="complex">Complex&lt;/h3>
&lt;p>I&amp;rsquo;m not going to discuss this one because complex numbers aren&amp;rsquo;t used much in R for data analysis, though they exist. These are just numbers with real and imaginary components (containing the number &lt;em>i&lt;/em>, or the square root of -1).&lt;/p>
&lt;h3 id="character">Character&lt;/h3>
&lt;p>Characters are another common data type. These are used to store text in R (also called &amp;ldquo;strings&amp;rdquo;). To indicate something is a character, we put quotation marks around it &lt;code>&amp;quot;&amp;quot;&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a vector of characters&lt;/span>
x &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;These&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;are&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;characters&amp;#34;&lt;/span>)
&lt;span style="color:#586e75"># View class&lt;/span>
&lt;span style="color:#268bd2">class&lt;/span>(x)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;character&amp;quot;
&lt;/code>&lt;/pre>&lt;p>Putting quotation marks around numbers will also turn them into characters, which can get confusing.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a vector of characters&lt;/span>
x &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;1&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;4&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;5&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;7&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;8&amp;#34;&lt;/span>)
&lt;span style="color:#586e75"># View vector&lt;/span>
x
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;1&amp;quot; &amp;quot;4&amp;quot; &amp;quot;5&amp;quot; &amp;quot;7&amp;quot; &amp;quot;8&amp;quot;
&lt;/code>&lt;/pre>&lt;p>You can&amp;rsquo;t do math with a vector of numbers that are classed as characters.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Try to do math&lt;/span>
&lt;span style="color:#268bd2">mean&lt;/span>(x)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## Warning in mean.default(x): argument is not numeric or logical: returning NA
&lt;/code>&lt;/pre>&lt;pre>&lt;code>## [1] NA
&lt;/code>&lt;/pre>&lt;p>Why? Because R views them as text!&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View class&lt;/span>
&lt;span style="color:#268bd2">class&lt;/span>(x)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;character&amp;quot;
&lt;/code>&lt;/pre>&lt;p>You can turn this character vector of numbers into a numeric vector using the &lt;code>as.numeric()&lt;/code> function.&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
Note: a common case of this happening is if you happen to accidentally have a character value (i.e. a letter or symbol) in a column of values that are otherwise supposed to be numeric. Adding a space to a number or empty cell might have the same effect. This can happen accidentally (and so easily!) during data entry, so using &lt;code>as.numeric()&lt;/code> is one way to resolve that issue. Any values that were character will be converted to &lt;code>NA&lt;/code>s. In that scenario you&amp;rsquo;ll probably want to go back and fix your raw CSV file, but at least now the NAs will help you find where the problem was.
&lt;/div>
&lt;/div>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Turn it into a numeric vector&lt;/span>
x &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">as.numeric&lt;/span>(x)
&lt;span style="color:#586e75"># View vector&lt;/span>
x
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 1 4 5 7 8
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View class&lt;/span>
&lt;span style="color:#268bd2">class&lt;/span>(x)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;numeric&amp;quot;
&lt;/code>&lt;/pre>&lt;p>And then you can turn it back into a character using &lt;code>as.character()&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Turn it back into a character&lt;/span>
x &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">as.character&lt;/span>(x)
&lt;span style="color:#586e75"># View vector&lt;/span>
x
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;1&amp;quot; &amp;quot;4&amp;quot; &amp;quot;5&amp;quot; &amp;quot;7&amp;quot; &amp;quot;8&amp;quot;
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View class&lt;/span>
&lt;span style="color:#268bd2">class&lt;/span>(x)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;character&amp;quot;
&lt;/code>&lt;/pre>&lt;h3 id="logical">Logical&lt;/h3>
&lt;p>The logical class is represented by only two possible values: &lt;code>TRUE&lt;/code> or &lt;code>FALSE&lt;/code> (also can be written &lt;code>T&lt;/code> / &lt;code>F&lt;/code>, but never &lt;code>true&lt;/code> / &lt;code>false&lt;/code> or &lt;code>t&lt;/code> / &lt;code>f&lt;/code>).&lt;/p>
&lt;p>These values result from any logical statements that are made. For example, in the code below I asked R if the elements of my vector were greater than 5. This returns a logical vector where each element is either &lt;code>TRUE&lt;/code> or &lt;code>FALSE&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a vector&lt;/span>
x &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1&lt;/span>, &lt;span style="color:#2aa198">5&lt;/span>, &lt;span style="color:#2aa198">6&lt;/span>, &lt;span style="color:#2aa198">7&lt;/span>, &lt;span style="color:#2aa198">2&lt;/span>, &lt;span style="color:#2aa198">8&lt;/span>)
&lt;span style="color:#586e75"># Are the elements of vector x greater than 5? Store results in vector y&lt;/span>
y &lt;span style="color:#719e07">&amp;lt;-&lt;/span> x &lt;span style="color:#719e07">&amp;gt;&lt;/span> &lt;span style="color:#2aa198">5&lt;/span>
&lt;span style="color:#586e75"># View y&lt;/span>
y
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] FALSE FALSE TRUE TRUE FALSE TRUE
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View class&lt;/span>
&lt;span style="color:#268bd2">class&lt;/span>(y)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;logical&amp;quot;
&lt;/code>&lt;/pre>&lt;p>You can also create a vector of logical statements.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create logical vector&lt;/span>
x &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#268bd2">T&lt;/span>, &lt;span style="color:#268bd2">F&lt;/span>, &lt;span style="color:#268bd2">T&lt;/span>, &lt;span style="color:#268bd2">F&lt;/span>, &lt;span style="color:#268bd2">F&lt;/span>, &lt;span style="color:#268bd2">T&lt;/span>)
&lt;span style="color:#586e75"># View vector&lt;/span>
x
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] TRUE FALSE TRUE FALSE FALSE TRUE
&lt;/code>&lt;/pre>&lt;p>And you can convert logical values to numeric values, and back. &lt;code>FALSE&lt;/code> is the same as &lt;code>0&lt;/code>, while &lt;code>TRUE&lt;/code> is the same as &lt;code>1&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Convert to numeric vector&lt;/span>
x &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">as.numeric&lt;/span>(x)
&lt;span style="color:#586e75"># View vector&lt;/span>
x
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 1 0 1 0 0 1
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Convert back to logical vector&lt;/span>
x &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">as.logical&lt;/span>(x)
&lt;span style="color:#586e75"># View vector again&lt;/span>
x
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] TRUE FALSE TRUE FALSE FALSE TRUE
&lt;/code>&lt;/pre>&lt;p>This also means that you can do math with logical values. This is useful if, for example, you&amp;rsquo;re trying to see how many &lt;code>TRUE&lt;/code> values you have in your vector. In fact, applying any math operations to a logical vector will automatically convert the values to 1s and 0s.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View vector&lt;/span>
x
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] TRUE FALSE TRUE FALSE FALSE TRUE
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Count how many &amp;#34;TRUE&amp;#34; values there are. There are 3!&lt;/span>
&lt;span style="color:#268bd2">sum&lt;/span>(x)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 3
&lt;/code>&lt;/pre>&lt;h3 id="factor">Factor&lt;/h3>
&lt;p>Factors are a special data type that is primarily used to represent repeating categories (i.e., categorical variables). When you specify an object as a factor, you&amp;rsquo;re telling R to think of it as a categorical variable, with different levels. This can be helpful when analyzing your data, as categorical variables and continuous variables are often handled differently in statistical analyses.&lt;/p>
&lt;p>In the code below, I created a data frame showing the height and sex of five individuals.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create an example data frame&lt;/span>
example &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">data.frame&lt;/span>(indiv &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;A&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;B&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;C&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;D&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;E&amp;#34;&lt;/span>),
height &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">15&lt;/span>, &lt;span style="color:#2aa198">10&lt;/span>, &lt;span style="color:#2aa198">12&lt;/span>, &lt;span style="color:#2aa198">9&lt;/span>, &lt;span style="color:#2aa198">17&lt;/span>),
sex &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;female&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;female&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;female&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;male&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;female&amp;#34;&lt;/span>))
&lt;span style="color:#586e75"># View structure of data frame&lt;/span>
&lt;span style="color:#268bd2">str&lt;/span>(example)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## 'data.frame': 5 obs. of 3 variables:
## $ indiv : chr &amp;quot;A&amp;quot; &amp;quot;B&amp;quot; &amp;quot;C&amp;quot; &amp;quot;D&amp;quot; ...
## $ height: num 15 10 12 9 17
## $ sex : chr &amp;quot;female&amp;quot; &amp;quot;female&amp;quot; &amp;quot;female&amp;quot; &amp;quot;male&amp;quot; ...
&lt;/code>&lt;/pre>&lt;p>Right now, the &lt;code>sex&lt;/code> column is a character vector because I entered the data in quotation marks. But really what I want to do is tell R that &lt;code>sex&lt;/code> is a categorical variable, with &amp;ldquo;female&amp;rdquo; and &amp;ldquo;male&amp;rdquo; as levels. To do that, all I have to do is use the &lt;code>as.factor()&lt;/code> function.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Change the sex column to be a factor&lt;/span>
example&lt;span style="color:#719e07">$&lt;/span>sex &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">as.factor&lt;/span>(example&lt;span style="color:#719e07">$&lt;/span>sex)
&lt;span style="color:#586e75"># View the factor&lt;/span>
example&lt;span style="color:#719e07">$&lt;/span>sex
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] female female female male female
## Levels: female male
&lt;/code>&lt;/pre>&lt;p>You can see that R listed the vector and then beneath that, has figured out on its own that the levels are &amp;ldquo;female&amp;rdquo; and &amp;ldquo;male&amp;rdquo;. When writing the levels, R will sort them in alphabetical order. That&amp;rsquo;s why the levels are &lt;code>female male&lt;/code> instead of &lt;code>male female&lt;/code>.&lt;/p>
&lt;p>You may want to change the order of your factor levels (this can be useful when plotting your data and determining the order in which they appear).&lt;/p>
&lt;p>For example, you might have a vector like this:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create vector&lt;/span>
places &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">factor&lt;/span>(&lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;first&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;first&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;second&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;third&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;fifth&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;fourth&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;second&amp;#34;&lt;/span>))
&lt;span style="color:#586e75"># View factor&lt;/span>
places
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] first first second third fifth fourth second
## Levels: fifth first fourth second third
&lt;/code>&lt;/pre>&lt;p>The order of the levels doesn&amp;rsquo;t make sense. We want it to go from first through fifth in the implied numeric order — not alphabetically. So let&amp;rsquo;s change the order using &lt;code>factor(vector, levels = c(&amp;quot;first&amp;quot;, &amp;quot;second&amp;quot;, &amp;quot;third&amp;quot;, etc.))&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Change level order&lt;/span>
places &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">factor&lt;/span>(places, levels &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;first&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;second&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;third&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;fourth&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;fifth&amp;#34;&lt;/span>))
&lt;span style="color:#586e75"># View factor&lt;/span>
places
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] first first second third fifth fourth second
## Levels: first second third fourth fifth
&lt;/code>&lt;/pre>&lt;p>Much better!&lt;/p>
&lt;p>Factors don&amp;rsquo;t just have to be text. They can also be integers. For example, in the code below I created a data frame describing the stream width and order of several stream sites. Stream order is &lt;em>not&lt;/em> a continuous variable, even though it&amp;rsquo;s represented by numbers. It would be best to convert stream order to a factor.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create data frame &lt;/span>
example2 &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">data.frame&lt;/span>(stream &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;Patuxent&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Patapsco&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Deer Creek&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Town Creek&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Browns Branch&amp;#34;&lt;/span>),
width &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">37&lt;/span>, &lt;span style="color:#2aa198">42&lt;/span>, &lt;span style="color:#2aa198">25&lt;/span>, &lt;span style="color:#2aa198">32&lt;/span>, &lt;span style="color:#2aa198">22&lt;/span>),
order &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">6&lt;/span>, &lt;span style="color:#2aa198">6&lt;/span>, &lt;span style="color:#2aa198">4&lt;/span>, &lt;span style="color:#2aa198">5&lt;/span>, &lt;span style="color:#2aa198">3&lt;/span>))
&lt;span style="color:#586e75"># View data frame structure&lt;/span>
&lt;span style="color:#268bd2">str&lt;/span>(example2)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## 'data.frame': 5 obs. of 3 variables:
## $ stream: chr &amp;quot;Patuxent&amp;quot; &amp;quot;Patapsco&amp;quot; &amp;quot;Deer Creek&amp;quot; &amp;quot;Town Creek&amp;quot; ...
## $ width : num 37 42 25 32 22
## $ order : num 6 6 4 5 3
&lt;/code>&lt;/pre>&lt;p>R sees stream order as being numeric, which makes sense. But let&amp;rsquo;s tell R that stream order is a factor.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Change stream order to a factor&lt;/span>
example2&lt;span style="color:#719e07">$&lt;/span>order &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">as.factor&lt;/span>(example2&lt;span style="color:#719e07">$&lt;/span>order)
&lt;span style="color:#586e75"># View stream order&lt;/span>
example2&lt;span style="color:#719e07">$&lt;/span>order
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 6 6 4 5 3
## Levels: 3 4 5 6
&lt;/code>&lt;/pre>&lt;p>Looks good. Since these are numbers, R just orders the levels in ascending order.&lt;/p>
&lt;h2 id="how-to-check-and-manipulate-data-types">How to check and manipulate data types&lt;/h2>
&lt;p>As demonstrated throughout this tutorial, it can be useful to check the type of data you&amp;rsquo;re working with and be able to change it to another type if you need. You might need this especially in situations where you&amp;rsquo;re reading in data from a .csv, and need to check that all your numbers are numeric instead of characters.&lt;/p>
&lt;p>The main way to check your data type is to use the function &lt;code>class()&lt;/code>. If you have a data frame, another easy way to check data types is to use the &lt;code>str()&lt;/code> function. This displays the structure of your data frame and tells you what data type each of your columns is. The example below lists heights over time for five individuals.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create an example data frame&lt;/span>
example &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">data.frame&lt;/span>(indiv &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;A&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;B&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;C&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;D&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;E&amp;#34;&lt;/span>),
height_0 &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">15&lt;/span>, &lt;span style="color:#2aa198">10&lt;/span>, &lt;span style="color:#2aa198">12&lt;/span>, &lt;span style="color:#2aa198">9&lt;/span>, &lt;span style="color:#2aa198">17&lt;/span>),
height_10 &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">20&lt;/span>, &lt;span style="color:#2aa198">18&lt;/span>, &lt;span style="color:#2aa198">14&lt;/span>, &lt;span style="color:#2aa198">15&lt;/span>, &lt;span style="color:#2aa198">19&lt;/span>),
height_20 &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">23&lt;/span>, &lt;span style="color:#2aa198">24&lt;/span>, &lt;span style="color:#2aa198">18&lt;/span>, &lt;span style="color:#2aa198">17&lt;/span>, &lt;span style="color:#2aa198">26&lt;/span>))
&lt;span style="color:#268bd2">str&lt;/span>(example)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## 'data.frame': 5 obs. of 4 variables:
## $ indiv : chr &amp;quot;A&amp;quot; &amp;quot;B&amp;quot; &amp;quot;C&amp;quot; &amp;quot;D&amp;quot; ...
## $ height_0 : num 15 10 12 9 17
## $ height_10: num 20 18 14 15 19
## $ height_20: num 23 24 18 17 26
&lt;/code>&lt;/pre>&lt;p>You can see that the column &lt;code>indiv&lt;/code> is a character vector (abbreviated &amp;ldquo;chr&amp;rdquo;), while each successive column is numeric (abbreviated &amp;ldquo;num&amp;rdquo;).&lt;/p>
&lt;p>You also noticed me using functions like &lt;code>is.numeric()&lt;/code> or &lt;code>as.character()&lt;/code>. All of the data types have &lt;code>is.&lt;/code> and &lt;code>as.&lt;/code> functions, where the first one is a logical statement to check the specific data type, asking &amp;ldquo;is this object of the class XXX?&amp;rdquo; and returns &lt;code>TRUE&lt;/code> or &lt;code>FALSE&lt;/code>. The &lt;code>as.&lt;/code> functions are actions that convert objects into a new data type. You may find yourself using these often when you&amp;rsquo;re first formatting your data and preparing it for analysis.&lt;/p>
&lt;p>That&amp;rsquo;s it for data types in R! Keep an eye out for our next tutorial, which will go over different data structures in R like vectors, lists, data frames, and tibbles. I hope this tutorial was helpful! Happy coding!&lt;/p>
&lt;br>
&lt;hr>
&lt;center>If you enjoyed this tutorial and want learn more about data types and how to use them, you can check out Luka Negoita's full course on the complete basics of R for ecology here:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://www.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning R now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>Complete tutorial on using 'apply' functions in R</title><link>https://www.rforecology.com/post/how-to-use-apply-functions/</link><pubDate>Tue, 08 Mar 2022 09:45:39 -0500</pubDate><guid>https://www.rforecology.com/post/how-to-use-apply-functions/</guid><description>&lt;script src="https://rforecology.activehosted.com/f/embed.php?id=21" type="text/javascript" charset="utf-8">&lt;/script>
&lt;p>Today I&amp;rsquo;m going to talk about a useful family of functions that allows you to repetitively perform a specified function (e.g., &lt;code>sum()&lt;/code>, &lt;code>mean()&lt;/code>) across a vector, list, matrix, or data frame. For those of you familiar with &amp;lsquo;for&amp;rsquo; loops, the &lt;code>apply()&lt;/code> family often allows you to avoid constructing those and instead wrap the loop into one simple function.&lt;/p>
&lt;p>I&amp;rsquo;m going to discuss the functions &lt;code>apply()&lt;/code>, &lt;code>lapply()&lt;/code>, &lt;code>sapply()&lt;/code>, and &lt;code>tapply()&lt;/code> in this blog post (as well as using the dplyr library for similar tasks). These functions all end in &lt;code>apply()&lt;/code> because you &lt;em>apply&lt;/em> the function you want across all the specified elements.&lt;/p>
&lt;p>Let&amp;rsquo;s see how they work.&lt;/p>
&lt;img src="https://www.rforecology.com/apply_image1.png" alt="Image of code saying apply the mean function across the columns of this data frame. There are arrows pointing from the code to each of the table columns. It also shows the output of the function" style="width:400px;"/>
&lt;h2 id="the-apply-function">The &lt;code>apply()&lt;/code> function&lt;/h2>
&lt;p>Let&amp;rsquo;s start with the &lt;code>apply()&lt;/code> function. First, we&amp;rsquo;ll create an example data set. This data set is in wide format* and describes the heights of five individuals (e.g., plants) in inches at three different time points (0, 10, and 20 days). The first column contains the IDs for each individual, and each successive column describes their heights at time points 0, 10, and 20 in that order.&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
*Note: &lt;em>Wide format&lt;/em> refers to having multiple repeated variations of the same column. In this example, &lt;em>Long format&lt;/em> would entail having just one column for &amp;lsquo;height&amp;rsquo; with the values 0, 10, and 20 listed below.
&lt;/div>
&lt;/div>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create data frame&lt;/span>
example &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">data.frame&lt;/span>(indiv &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;A&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;B&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;C&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;D&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;E&amp;#34;&lt;/span>),
height_0 &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">15&lt;/span>, &lt;span style="color:#2aa198">10&lt;/span>, &lt;span style="color:#2aa198">12&lt;/span>, &lt;span style="color:#2aa198">9&lt;/span>, &lt;span style="color:#2aa198">17&lt;/span>),
height_10 &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">20&lt;/span>, &lt;span style="color:#2aa198">18&lt;/span>, &lt;span style="color:#2aa198">14&lt;/span>, &lt;span style="color:#2aa198">15&lt;/span>, &lt;span style="color:#2aa198">19&lt;/span>),
height_20 &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">23&lt;/span>, &lt;span style="color:#2aa198">24&lt;/span>, &lt;span style="color:#2aa198">18&lt;/span>, &lt;span style="color:#2aa198">17&lt;/span>, &lt;span style="color:#2aa198">26&lt;/span>))
&lt;span style="color:#586e75"># View the data frame&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(example)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## indiv height_0 height_10 height_20
## 1 A 15 20 23
## 2 B 10 18 24
## 3 C 12 14 18
## 4 D 9 15 17
## 5 E 17 19 26
&lt;/code>&lt;/pre>&lt;p>&lt;code>apply()&lt;/code> lets you perform a function across a data frame&amp;rsquo;s rows or columns. In the arguments, you specify what you want as follows: &lt;code>apply(X = data.frame, MARGIN = 1, FUN = function.you.want)&lt;/code>. First, you enter the data frame you want to analyze, then &lt;code>MARGIN&lt;/code> asks you which dimension you want to analyze. &lt;code>MARGIN = 1&lt;/code> indicates that you want to analyze across the data frame&amp;rsquo;s rows, while &lt;code>MARGIN = 2&lt;/code> analyzes across columns. Then you enter the name of the function that will be applied to the rows or columns (don&amp;rsquo;t include parentheses or function arguments).&lt;/p>
&lt;p>So let&amp;rsquo;s try finding the mean plant height for each row (i.e., for each individual). We also have to subset our data to only contain height values (columns 2 through 4) because our first column contains the individual identifiers.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Calculating the mean for each row in the data frame&lt;/span>
row.avg &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">apply&lt;/span>(X &lt;span style="color:#719e07">=&lt;/span> example[, &lt;span style="color:#2aa198">2&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">4&lt;/span>], MARGIN &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">1&lt;/span>, FUN &lt;span style="color:#719e07">=&lt;/span> mean)
&lt;span style="color:#586e75"># View row.avg&lt;/span>
row.avg
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 19.33333 17.33333 14.66667 13.66667 20.66667
&lt;/code>&lt;/pre>&lt;p>This returns a vector where each position corresponds to the row number that was averaged. Individual A&amp;rsquo;s average height is in position 1, B&amp;rsquo;s is in position 2, etc.&lt;/p>
&lt;p>If we find the mean for each column (i.e., each time point), it returns a vector with named positions for each column that was analyzed.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Calculating the mean for each column in the data frame&lt;/span>
col.avg &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">apply&lt;/span>(example[, &lt;span style="color:#2aa198">2&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">4&lt;/span>], &lt;span style="color:#2aa198">2&lt;/span>, mean)
&lt;span style="color:#586e75"># View col.avg&lt;/span>
col.avg
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## height_0 height_10 height_20
## 12.6 17.2 21.6
&lt;/code>&lt;/pre>&lt;div class="alert alert-note">
&lt;div>
Note: I used finding the mean as an example, but if you were actually trying to find the mean across the rows or columns of a data frame, you should use the &lt;code>rowMeans()&lt;/code> or &lt;code>colMeans()&lt;/code> functions instead of &lt;code>apply()&lt;/code>, as they work more efficiently.
&lt;/div>
&lt;/div>
&lt;p>You don&amp;rsquo;t just have to use pre-made functions like &lt;code>sum()&lt;/code> or &lt;code>mean()&lt;/code>. You could also write your own function to use. In the code below, I wrote a function that tells you if the average plant height is above 15 inches.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create function is_tall&lt;/span>
is_tall &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">function&lt;/span>(x) {
value &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">mean&lt;/span>(x) &lt;span style="color:#719e07">&amp;gt;&lt;/span> &lt;span style="color:#2aa198">15&lt;/span>
&lt;span style="color:#268bd2">return&lt;/span>(value)
}
&lt;span style="color:#586e75"># Apply the function to the columns in the data frame&lt;/span>
&lt;span style="color:#268bd2">apply&lt;/span>(example[, &lt;span style="color:#2aa198">2&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">4&lt;/span>], &lt;span style="color:#2aa198">2&lt;/span>, is_tall)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## height_0 height_10 height_20
## FALSE TRUE TRUE
&lt;/code>&lt;/pre>&lt;p>This tells me that at time point 0, the plants are not taller than 15 cm on average, while the opposite is true for time points 10 and 20.&lt;/p>
&lt;h2 id="the-lapply-function">The &lt;code>lapply()&lt;/code> function&lt;/h2>
&lt;p>Let&amp;rsquo;s look at another function, called &lt;code>lapply()&lt;/code>. The &amp;ldquo;L&amp;rdquo; in front of &amp;ldquo;apply&amp;rdquo; stands for &amp;ldquo;lists&amp;rdquo;, because this function is used on list objects and returns a list as well.&lt;/p>
&lt;p>I created a list called &lt;code>plants&lt;/code>, containing three elements that are each vectors with a length of ten. Each element in the list contains different plant attributes (height, mass, and # of flowers). I used the &lt;code>runif()&lt;/code> function to generate random numbers, and used the &lt;code>sample()&lt;/code> function to generate random integers between one and ten.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Set seed so that the randomly-generated numbers are the same each time&lt;/span>
&lt;span style="color:#268bd2">set.seed&lt;/span>(&lt;span style="color:#2aa198">123&lt;/span>)
&lt;span style="color:#586e75"># Create a list using randomly-generated numbers&lt;/span>
plants &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">list&lt;/span>(height &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">runif&lt;/span>(&lt;span style="color:#2aa198">10&lt;/span>, min &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">10&lt;/span>, max &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">20&lt;/span>),
mass &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">runif&lt;/span>(&lt;span style="color:#2aa198">10&lt;/span>, min &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">5&lt;/span>, max &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">10&lt;/span>),
flowers &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">sample&lt;/span>(&lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">10&lt;/span>, &lt;span style="color:#2aa198">10&lt;/span>))
&lt;span style="color:#586e75"># View the list&lt;/span>
plants
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## $height
## [1] 12.87578 17.88305 14.08977 18.83017 19.40467 10.45556 15.28105 18.92419
## [9] 15.51435 14.56615
##
## $mass
## [1] 9.784167 7.266671 8.387853 7.863167 5.514623 9.499125 6.230439 5.210298
## [9] 6.639604 9.772518
##
## $flowers
## [1] 9 10 1 5 3 2 6 7 8 4
&lt;/code>&lt;/pre>&lt;p>If we wanted to calculate the average value for each list element, we could do it individually:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">mean&lt;/span>(plants&lt;span style="color:#719e07">$&lt;/span>height)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 15.78248
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">mean&lt;/span>(plants&lt;span style="color:#719e07">$&lt;/span>mass)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 7.616846
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">mean&lt;/span>(plants&lt;span style="color:#719e07">$&lt;/span>flowers)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 5.5
&lt;/code>&lt;/pre>&lt;p>This method is pretty inefficient and makes us repeat our code. And what if we have more than three list elements? That would be a pain to type out. Let&amp;rsquo;s try another method.&lt;/p>
&lt;p>We could create a for loop and save the results in a vector:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create an empty vector&lt;/span>
plant_avgs &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>()
&lt;span style="color:#586e75"># Loop the averages for each element and save in our vector&lt;/span>
&lt;span style="color:#268bd2">for&lt;/span>(i in &lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">3&lt;/span>){
plant_avgs[i] &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">mean&lt;/span>(plants[[i]])
}
&lt;span style="color:#586e75"># View the vector&lt;/span>
plant_avgs
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 15.782475 7.616846 5.500000
&lt;/code>&lt;/pre>&lt;p>This method is better because it automates the process, which would be especially useful if our list had a ton of elements. But &lt;code>for&lt;/code> loops also take more time to run and construct, and still take up quite a bit of space in our code.&lt;/p>
&lt;p>Let&amp;rsquo;s try one last method: using &lt;code>lapply()&lt;/code> to wrap this whole process into a neat function. &lt;code>lapply()&lt;/code> doesn&amp;rsquo;t have the &lt;code>MARGIN&lt;/code> argument that &lt;code>apply()&lt;/code> has. Instead, &lt;code>lapply()&lt;/code> already knows that it should apply the specified function across all list elements. You can just type &lt;code>lapply(X = list, FUN = function.you.want)&lt;/code>, like this:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Use lapply to find the mean of each list element&lt;/span>
&lt;span style="color:#268bd2">lapply&lt;/span>(plants, mean)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## $height
## [1] 15.78248
##
## $mass
## [1] 7.616846
##
## $flowers
## [1] 5.5
&lt;/code>&lt;/pre>&lt;p>You&amp;rsquo;ll notice that the output of &lt;code>lapply()&lt;/code> is also a list, where the means of &lt;code>height&lt;/code>, &lt;code>mass&lt;/code>, and &lt;code>flowers&lt;/code> are saved as list elements of the same name. &lt;code>lapply()&lt;/code> does the same thing as the for loop, but is far more efficient in terms of space and effort. &lt;code>lapply()&lt;/code> ends up being the best of the three methods I just showed you.&lt;/p>
&lt;h2 id="the-sapply-function">The &lt;code>sapply()&lt;/code> function&lt;/h2>
&lt;p>In the previous example, our means were returned as elements in a list, but each list element was represented by just one value. There wasn&amp;rsquo;t really any reason for those values to be put in a list format instead of, say, a vector.&lt;/p>
&lt;p>This is where the &lt;code>sapply()&lt;/code> function comes in. It goes hand-in-hand with &lt;code>lapply()&lt;/code> and works the same way, where it can accept a list and a function name as the input. But instead of returning a list, it will return the answers in the simplest possible format. In our case, this would mean returning the answers as a vector like below, which usually makes it easier to work with down the line.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Use sapply to find the mean of each list element&lt;/span>
&lt;span style="color:#268bd2">sapply&lt;/span>(plants, mean)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## height mass flowers
## 15.782475 7.616846 5.500000
&lt;/code>&lt;/pre>&lt;h2 id="the-tapply-function">The &lt;code>tapply()&lt;/code> function&lt;/h2>
&lt;p>The &lt;code>tapply()&lt;/code> function works in much the same way as the other functions, but allows you to perform an operation across specified groups in your data. For those of you familiar with the &lt;code>dplyr&lt;/code> package, this does the same thing as the &lt;code>group_by()&lt;/code> and &lt;code>summarize()&lt;/code> functions.&lt;/p>
&lt;p>Let&amp;rsquo;s return to our example data set from before, where we described the heights of several different individuals over time. This time, we&amp;rsquo;re going to write the data in &lt;em>long format&lt;/em>, so that each row represents one observation. Stay tuned for a tutorial post on reshaping data in R coming soon if you&amp;rsquo;re interested in learning more about wide vs. long format data.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Load library to use the pivot_longer() function&lt;/span>
&lt;span style="color:#268bd2">library&lt;/span>(tidyverse)
&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Pivot the data so that the data are in long format instead of wide format&lt;/span>
example &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">pivot_longer&lt;/span>(example, cols &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">2&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">4&lt;/span>, names_to &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;time&amp;#34;&lt;/span>, values_to &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;height&amp;#34;&lt;/span>)
&lt;span style="color:#586e75"># Use sub() to get rid of the string &amp;#34;height_&amp;#34; in front of the time values&lt;/span>
example&lt;span style="color:#719e07">$&lt;/span>time &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">sub&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;height_&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>, example&lt;span style="color:#719e07">$&lt;/span>time)
&lt;span style="color:#586e75"># View data&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(example)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 6 × 3
## indiv time height
## &amp;lt;chr&amp;gt; &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt;
## 1 A 0 15
## 2 A 10 20
## 3 A 20 23
## 4 B 0 10
## 5 B 10 18
## 6 B 20 24
&lt;/code>&lt;/pre>&lt;p>You can see that now we have a column for time, with values of 0, 10, and 20. Let&amp;rsquo;s use &lt;code>tapply()&lt;/code> to look at each individuals' heights, grouped by time. The function accepts a new argument called &lt;code>INDEX&lt;/code>: &lt;code>tapply(X = vector.to.analyze, INDEX = vector.to.group.by, FUN = function.you.want)&lt;/code>. In the code below, I wanted to analyze the height values grouped by time, using the function &lt;code>mean()&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Use tapply() to find average height by time grouping&lt;/span>
&lt;span style="color:#268bd2">tapply&lt;/span>(X &lt;span style="color:#719e07">=&lt;/span> example&lt;span style="color:#719e07">$&lt;/span>height, INDEX &lt;span style="color:#719e07">=&lt;/span> example&lt;span style="color:#719e07">$&lt;/span>time, mean)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## 0 10 20
## 12.6 17.2 21.6
&lt;/code>&lt;/pre>&lt;p>Looks good! &lt;code>tapply()&lt;/code> returned a vector of values for the average heights at different time points.&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
Note: You may have noticed that in all of my examples, I&amp;rsquo;m using &lt;code>apply()&lt;/code> across a list or a data frame. Even though the &lt;code>apply()&lt;/code> family of functions can be used across a simple vector, there&amp;rsquo;s often no need to do so. Most functions in R are already &amp;ldquo;vectorized&amp;rdquo;, which means the function will be applied to each element of the vector instead of having to loop through one element at a time. For example, the &lt;code>sqrt()&lt;/code> function is vectorized. Doing &lt;code>sqrt(vector)&lt;/code> and &lt;code>sapply(vector, sqrt)&lt;/code> will return the same answer, so using the &lt;code>apply()&lt;/code> function is unnecessary. It is almost always faster to use the vectorized function than to run a loop or to use an &lt;code>apply()&lt;/code> function, if you have the option. And in some cases, running a &lt;code>for&lt;/code> loop might even be faster than using an &lt;code>apply()&lt;/code> function. Check out &lt;a href="https://lorentzen.ch/index.php/2022/02/19/avoid-loops-in-r-really/">this blog post by Michael Mayer&lt;/a> for a great comparison of different methods.
&lt;/div>
&lt;/div>
&lt;h2 id="the-apply-functions-vs-dplyr-functions">The &lt;code>apply()&lt;/code> functions vs. &lt;code>dplyr&lt;/code> functions&lt;/h2>
&lt;p>Some of you may be wondering about how useful the &lt;code>apply()&lt;/code> functions can be after you&amp;rsquo;ve learned how to use &lt;code>dplyr&lt;/code> functions.&lt;/p>
&lt;p>I just demonstrated how to use &lt;code>tapply()&lt;/code>, but the same thing could have been accomplished in &lt;code>dplyr&lt;/code>. Below, I grouped the data by the &lt;code>time&lt;/code> column, and created a column called &lt;code>avg_height&lt;/code> that calculates the mean height for each time group. See &lt;a href="https://www.rforecology.com/post/how-to-use-the-group-by-function/" target="_blank" rel="noopener">our tutorial here&lt;/a> for a more in-depth discussion of the &lt;code>group_by()&lt;/code> function.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Show grouping example in dplyr&lt;/span>
example &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">group_by&lt;/span>(time) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">summarize&lt;/span>(avg_height &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">mean&lt;/span>(height)) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">ungroup&lt;/span>()
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 3 × 2
## time avg_height
## &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt;
## 1 0 12.6
## 2 10 17.2
## 3 20 21.6
&lt;/code>&lt;/pre>&lt;p>This returns a table of values rather than a vector, but it still contains the same basic information. It shows the average heights of individuals at three different time points. So which method is better, &lt;code>dplyr&lt;/code> functions or &lt;code>tapply()&lt;/code>?&lt;/p>
&lt;p>The answer is that it depends on what you&amp;rsquo;re going to do afterwards! &lt;code>tapply()&lt;/code> might be useful to get a quick answer. It&amp;rsquo;s one easy line of code that tells you the average heights. The &lt;code>dplyr&lt;/code> method is useful if you&amp;rsquo;re going to keep working on the data. The &lt;a href="https://www.rforecology.com/post/how-to-use-pipes/" target="_blank" rel="noopener">pipe operator&lt;/a> (&lt;code>%&amp;gt;%&lt;/code>) allows you to use the output of one function as the input of another, without having to create intermediate variables.&lt;/p>
&lt;p>For example, in the code below, I wanted to not only summarize the average heights at each time point, but I also wanted to filter out only the heights that were greater than 15. I did that easily by adding another pipe to the end of my previous line and typing the next short bit of code.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Show grouping example in dplyr and further manipulation&lt;/span>
example &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">group_by&lt;/span>(time) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">summarize&lt;/span>(avg_height &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">mean&lt;/span>(height)) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">ungroup&lt;/span>() &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">filter&lt;/span>(avg_height &lt;span style="color:#719e07">&amp;gt;&lt;/span> &lt;span style="color:#2aa198">15&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 2 × 2
## time avg_height
## &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt;
## 1 10 17.2
## 2 20 21.6
&lt;/code>&lt;/pre>&lt;p>There are other &lt;code>dplyr()&lt;/code> functions that are analogous to the rest of the &lt;code>apply()&lt;/code> family. For example, the &lt;code>across()&lt;/code> function works similarly to &lt;code>apply()&lt;/code>. Let&amp;rsquo;s go back to the previous wide format of our &lt;code>example&lt;/code> data frame by using &lt;code>pivot_wider()&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Turn the data frame back into wide format&lt;/span>
example &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">pivot_wider&lt;/span>(example, indiv, names_from &lt;span style="color:#719e07">=&lt;/span> time, values_from &lt;span style="color:#719e07">=&lt;/span> height)
&lt;span style="color:#586e75"># View data frame&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(example)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 5 × 4
## indiv `0` `10` `20`
## &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;
## 1 A 15 20 23
## 2 B 10 18 24
## 3 C 12 14 18
## 4 D 9 15 17
## 5 E 17 19 26
&lt;/code>&lt;/pre>&lt;p>Let&amp;rsquo;s say we want to convert our height values from inches to centimeters by multiplying by 2.54. We can use the &lt;code>across()&lt;/code> function to do this. In the code below, I wrote a quick function that multiplies your values by 2.54 to convert from inches to cm. Then I used the function &lt;code>mutate()&lt;/code> to change the data frame. Using &lt;code>across()&lt;/code>, I indicated that I wanted to modify columns 2 through 4 using the &lt;code>to_cm()&lt;/code> function.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Write function called to_cm that converts values from inches to cm&lt;/span>
to_cm &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">function&lt;/span>(x){
cm &lt;span style="color:#719e07">&amp;lt;-&lt;/span> x &lt;span style="color:#719e07">*&lt;/span> &lt;span style="color:#2aa198">2.54&lt;/span>
&lt;span style="color:#268bd2">return&lt;/span>(cm)
}
&lt;span style="color:#586e75"># Convert height from inches to centimeters&lt;/span>
example &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">mutate&lt;/span>(&lt;span style="color:#268bd2">across&lt;/span>(&lt;span style="color:#2aa198">2&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">4&lt;/span>, to_cm))
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 5 × 4
## indiv `0` `10` `20`
## &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;
## 1 A 38.1 50.8 58.4
## 2 B 25.4 45.7 61.0
## 3 C 30.5 35.6 45.7
## 4 D 22.9 38.1 43.2
## 5 E 43.2 48.3 66.0
&lt;/code>&lt;/pre>&lt;p>And&amp;hellip; ta-da! Our data has now been changed from inches to cm.&lt;/p>
&lt;p>If we were to perform an operation across rows in &lt;code>dplyr&lt;/code>, we would need to group by rows using the &lt;code>rowwise()&lt;/code> function before performing any other operation (it works the same way as the &lt;code>group_by()&lt;/code> function, just groups by rows).&lt;/p>
&lt;p>Again, using the &lt;code>dplyr&lt;/code> functions instead of &lt;code>apply()&lt;/code> is up to your own discretion. &lt;code>apply()&lt;/code> is an easy, one-line function that can account for row-wise and column-wise operations. But &lt;code>dplyr&lt;/code> offers a useful grammar (pipes!) that allows you to keep working smoothly without interruption in your code. Different circumstances will call for different methods, and it might take some trial and error before you discover the method that works best for you in each situation.&lt;/p>
&lt;p>That concludes our summary of the &lt;code>apply()&lt;/code> functions! We learned how to use &lt;code>apply()&lt;/code>, &lt;code>lapply()&lt;/code>, &lt;code>sapply()&lt;/code>, and &lt;code>tapply()&lt;/code>, and we discussed equivalent &lt;code>dplyr&lt;/code> functions for &lt;code>apply()&lt;/code> and &lt;code>tapply()&lt;/code>.&lt;/p>
&lt;p>Let us know what you think of &lt;code>apply()&lt;/code> vs &lt;code>dplyr&lt;/code> in the comments! Do you have a preferred method?&lt;/p>
&lt;br>
&lt;hr>
&lt;center>If you enjoyed this tutorial and want learn more, you can check out Luka Negoita's full course on the complete basics of R for ecology here:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://www.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning R now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>How to use pipes to clean up your R code</title><link>https://www.rforecology.com/post/how-to-use-pipes/</link><pubDate>Wed, 02 Mar 2022 02:45:39 -0500</pubDate><guid>https://www.rforecology.com/post/how-to-use-pipes/</guid><description>&lt;p>I&amp;rsquo;ve talked a little bit about pipes (written as &lt;code>%&amp;gt;%&lt;/code>) in a &lt;a href="https://www.rforecology.com/post/how-to-use-the-group-by-function/" target="_blank" rel="noopener">past blog post&lt;/a>, but they&amp;rsquo;re so important in R that I thought they deserved their own post.&lt;/p>
&lt;p>In this tutorial, I&amp;rsquo;m going to give an explanation of what pipes are and when they can be used, and then I&amp;rsquo;m going to demonstrate how useful they can be for writing clean and neat R code.&lt;/p>
&lt;img src="https://www.rforecology.com/pipes_image0.png" alt="Image saying 'Using pipes in R', showing pipes connecting a workflow from data to filtering to mutating to grouping to summarizing." style="width:400px;"/>
&lt;h3 id="what-is-a-pipe">What is a pipe?&lt;/h3>
&lt;p>A pipe is a type of operator in R that comes with the &lt;code>magrittr&lt;/code> package. It takes the output of one function and passes it as the first argument of the next function, allowing us to chain together several steps in R. Pipes help your code flow better, making it cleaner and more efficient.&lt;/p>
&lt;p>The pipe shines when used in conjunction with the &lt;code>dplyr&lt;/code> package and its functions such as &lt;code>filter&lt;/code>, &lt;code>mutate&lt;/code>, and &lt;code>summarise&lt;/code>, as we often need to use these one after another to manipulate our data. Luckily, the pipe comes loaded with &lt;code>dplyr&lt;/code>, so there&amp;rsquo;s no need to load the &lt;code>magrittr&lt;/code> package unless you specifically need to use the other &lt;code>magrittr&lt;/code> operators.&lt;/p>
&lt;img src="https://www.rforecology.com/pipe_image1.png" alt="Image of a pipe in R with crossed out text below it reading 'This is not a pipe' in French, in reference to Magritte's painting called 'The Treachery of Images'. There is text on top that says 'This is a pipe.'" style="width:400px;"/>
&lt;h3 id="a-quick-demonstration-on-how-to-use-pipes">A quick demonstration on how to use pipes&lt;/h3>
&lt;p>Let&amp;rsquo;s see pipes in action. First, load the &lt;code>dplyr&lt;/code> package and download the classic &lt;code>iris&lt;/code> data set that comes with R. If you don&amp;rsquo;t have &lt;code>dplyr&lt;/code> installed yet, you&amp;rsquo;ll need to run &lt;code>install.packages(&amp;quot;dplyr&amp;quot;)&lt;/code> before loading the package.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Load dplyr&lt;/span>
&lt;span style="color:#268bd2">library&lt;/span>(dplyr)
&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Load data&lt;/span>
&lt;span style="color:#268bd2">data&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;iris&amp;#34;&lt;/span>)
&lt;span style="color:#586e75"># View data&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(iris)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## Sepal.Length Sepal.Width Petal.Length Petal.Width Species
## 1 5.1 3.5 1.4 0.2 setosa
## 2 4.9 3.0 1.4 0.2 setosa
## 3 4.7 3.2 1.3 0.2 setosa
## 4 4.6 3.1 1.5 0.2 setosa
## 5 5.0 3.6 1.4 0.2 setosa
## 6 5.4 3.9 1.7 0.4 setosa
&lt;/code>&lt;/pre>&lt;p>These data describe several measurements for three plant species (&lt;em>Iris setosa&lt;/em>, &lt;em>Iris versicolor&lt;/em>, and &lt;em>Iris virginica&lt;/em>). These measurements describe morphological differences among the three species in terms of sepal length and width and petal length and width, all in centimeters.&lt;/p>
&lt;p>I want to keep only the largest plants in the data set, so let&amp;rsquo;s only include plants with Sepal.Length greater than 5 cm, and Petal.Length greater than 3 cm. I also want to create two columns called &amp;ldquo;Sepal.Area&amp;rdquo; and &amp;ldquo;Petal.Area&amp;rdquo;, equivalent to length x width (for an approximation of sepal/petal area). To do this, I&amp;rsquo;ll use the &lt;code>filter()&lt;/code> and &lt;code>mutate()&lt;/code> functions. Notice that I also hit &amp;ldquo;Enter&amp;rdquo; or &amp;ldquo;Return&amp;rdquo; to add a new line after every pipe to keep the code clean and keep each function on a separate line.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Filter and mutate data&lt;/span>
new_iris &lt;span style="color:#719e07">&amp;lt;-&lt;/span> iris &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">filter&lt;/span>(Sepal.Length &lt;span style="color:#719e07">&amp;gt;&lt;/span> &lt;span style="color:#2aa198">5&lt;/span> &lt;span style="color:#719e07">&amp;amp;&lt;/span> Petal.Length &lt;span style="color:#719e07">&amp;gt;&lt;/span> &lt;span style="color:#2aa198">3&lt;/span>) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">mutate&lt;/span>(Sepal.Area &lt;span style="color:#719e07">=&lt;/span> Sepal.Length &lt;span style="color:#719e07">*&lt;/span> Sepal.Width,
Petal.Area &lt;span style="color:#719e07">=&lt;/span> Petal.Length &lt;span style="color:#719e07">*&lt;/span> Petal.Width)
&lt;span style="color:#586e75"># View new data&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(new_iris)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## Sepal.Length Sepal.Width Petal.Length Petal.Width Species Sepal.Area
## 1 7.0 3.2 4.7 1.4 versicolor 22.40
## 2 6.4 3.2 4.5 1.5 versicolor 20.48
## 3 6.9 3.1 4.9 1.5 versicolor 21.39
## 4 5.5 2.3 4.0 1.3 versicolor 12.65
## 5 6.5 2.8 4.6 1.5 versicolor 18.20
## 6 5.7 2.8 4.5 1.3 versicolor 15.96
## Petal.Area
## 1 6.58
## 2 6.75
## 3 7.35
## 4 5.20
## 5 6.90
## 6 5.85
&lt;/code>&lt;/pre>&lt;p>Our data set looks good. You&amp;rsquo;ll see that my arguments in the &lt;code>filter()&lt;/code> and &lt;code>mutate()&lt;/code> functions are a bit different from usual. Normally, most of the &lt;code>dplyr&lt;/code> functions are formatted like this: &lt;code>function(data, arguments)&lt;/code>.&lt;/p>
&lt;p>Remember that pipes take the output of what came before it and passes it as the first argument of the function that follows. Thus, the &lt;code>filter()&lt;/code> function receives &lt;code>iris&lt;/code> as it&amp;rsquo;s &lt;code>data&lt;/code> argument, and then the &lt;code>mutate()&lt;/code> function receives &lt;code>filter(data=iris, Sepal.Length &amp;gt; 5 &amp;amp; Petal.Length &amp;gt; 3)&lt;/code> as its &lt;code>data&lt;/code> argument.&lt;/p>
&lt;p>With pipes there was no need for me to write &lt;code>filter(iris, Sepal.Length &amp;gt; 5 &amp;amp; Petal.Length &amp;gt; 3)&lt;/code>, because that would be repetitive—I could just skip straight to the arguments and write &lt;code>filter(Sepal.Length &amp;gt; 5 &amp;amp; Petal.Length &amp;gt; 3)&lt;/code>.&lt;/p>
&lt;p>To summarize in plain English (each &lt;strong>then&lt;/strong> in this sentence can be substituted for a pipe):&lt;/p>
&lt;ul>
&lt;li>I wrote code starting with the &lt;code>iris&lt;/code> data set, &lt;strong>then&lt;/strong> filtered it by Sepal.Length and Petal.Length, &lt;strong>then&lt;/strong> used mutate to create two new columns.&lt;/li>
&lt;/ul>
&lt;p>Without pipes, our sentence becomes longer:&lt;/p>
&lt;ul>
&lt;li>I wrote code starting with the &lt;code>iris&lt;/code> data set. I filtered the &lt;code>iris&lt;/code> data set by Sepal.Length and Petal.Length. Using the filtered &lt;code>iris&lt;/code> data, I used mutate to create two new columns.&lt;/li>
&lt;/ul>
&lt;p>And those are the essentials of using pipes!&lt;/p>
&lt;h3 id="cleaning-code-with-pipes">Cleaning code with pipes&lt;/h3>
&lt;p>After that last example, you might be thinking, OK, that&amp;rsquo;s pretty cool. But can it really make that big of a difference for organizing my code? The answer is&amp;hellip;yes! And I&amp;rsquo;ll quickly demonstrate why.&lt;/p>
&lt;h4 id="example-1-creating-new-variables-for-each-step">Example 1: Creating new variables for each step&lt;/h4>
&lt;p>Let&amp;rsquo;s filter and mutate our data like we did above, then group by species and summarize to find the average sepal and petal area within each species. Without pipes, our code might look like this:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">filtered_iris &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">filter&lt;/span>(iris, Sepal.Length &lt;span style="color:#719e07">&amp;gt;&lt;/span> &lt;span style="color:#2aa198">5&lt;/span> &lt;span style="color:#719e07">&amp;amp;&lt;/span> Petal.Length &lt;span style="color:#719e07">&amp;gt;&lt;/span> &lt;span style="color:#2aa198">3&lt;/span>)
mutated_iris &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">mutate&lt;/span>(filtered_iris,
Sepal.Area &lt;span style="color:#719e07">=&lt;/span> Sepal.Length &lt;span style="color:#719e07">*&lt;/span> Sepal.Width,
Petal.Area &lt;span style="color:#719e07">=&lt;/span> Petal.Length &lt;span style="color:#719e07">*&lt;/span> Petal.Width)
grouped_iris &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">group_by&lt;/span>(mutated_iris, Species)
summary_iris &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">summarize&lt;/span>(grouped_iris,
avg.sepal.area &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">mean&lt;/span>(Sepal.Area),
avg.petal.area &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">mean&lt;/span>(Petal.Area))
&lt;span style="color:#586e75"># View result&lt;/span>
summary_iris
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 2 × 3
## Species avg.sepal.area avg.petal.area
## &amp;lt;fct&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;
## 1 versicolor 17.0 5.93
## 2 virginica 19.8 11.4
&lt;/code>&lt;/pre>&lt;p>Whew. It can be a little exhausting to have to save each step as a new variable, and now our environment will be cluttered with a bunch of intermediate variables. Aside from the clutter, your code is also much more prone to errors if you change something in the earlier steps but forget to run those lines before the later steps again. So let&amp;rsquo;s not do that then.&lt;/p>
&lt;h4 id="example-2-nesting-functions">Example 2: Nesting functions&lt;/h4>
&lt;p>Let&amp;rsquo;s try another method, where we nest each function inside the previous one.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">summarize&lt;/span>(&lt;span style="color:#268bd2">group_by&lt;/span>(&lt;span style="color:#268bd2">mutate&lt;/span>(&lt;span style="color:#268bd2">filter&lt;/span>(iris,
Sepal.Length &lt;span style="color:#719e07">&amp;gt;&lt;/span> &lt;span style="color:#2aa198">5&lt;/span> &lt;span style="color:#719e07">&amp;amp;&lt;/span> Petal.Length &lt;span style="color:#719e07">&amp;gt;&lt;/span> &lt;span style="color:#2aa198">3&lt;/span>),
Sepal.Area &lt;span style="color:#719e07">=&lt;/span> Sepal.Length &lt;span style="color:#719e07">*&lt;/span> Sepal.Width,
Petal.Area &lt;span style="color:#719e07">=&lt;/span> Petal.Length &lt;span style="color:#719e07">*&lt;/span> Petal.Width),
Species),
avg.sepal.area &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">mean&lt;/span>(Sepal.Area),
avg.petal.area &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">mean&lt;/span>(Petal.Area))
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 2 × 3
## Species avg.sepal.area avg.petal.area
## &amp;lt;fct&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;
## 1 versicolor 17.0 5.93
## 2 virginica 19.8 11.4
&lt;/code>&lt;/pre>&lt;p>That doesn&amp;rsquo;t really look much better. If all these nested functions are making your head spin, don&amp;rsquo;t worry, it&amp;rsquo;s doing that to me too. Code like this is a great way to spend hours searching for errors&amp;hellip; only to realize you&amp;rsquo;re missing a parenthesis. 😖&lt;/p>
&lt;h4 id="example-3-pipes">Example 3: Pipes!&lt;/h4>
&lt;p>Let&amp;rsquo;s try it with pipes:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">iris &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">filter&lt;/span>(Sepal.Length &lt;span style="color:#719e07">&amp;gt;&lt;/span> &lt;span style="color:#2aa198">5&lt;/span> &lt;span style="color:#719e07">&amp;amp;&lt;/span> Petal.Length &lt;span style="color:#719e07">&amp;gt;&lt;/span> &lt;span style="color:#2aa198">3&lt;/span>) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">mutate&lt;/span>(Sepal.Area &lt;span style="color:#719e07">=&lt;/span> Sepal.Length &lt;span style="color:#719e07">*&lt;/span> Sepal.Width,
Petal.Area &lt;span style="color:#719e07">=&lt;/span> Petal.Length &lt;span style="color:#719e07">*&lt;/span> Petal.Width) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">group_by&lt;/span>(Species) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">summarize&lt;/span>(avg.sepal.area &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">mean&lt;/span>(Sepal.Area),
avg.petal.area &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">mean&lt;/span>(Petal.Area))
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 2 × 3
## Species avg.sepal.area avg.petal.area
## &amp;lt;fct&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;
## 1 versicolor 17.0 5.93
## 2 virginica 19.8 11.4
&lt;/code>&lt;/pre>&lt;p>Now the flow of our code is much cleaner and clearer. Others will be able to follow our code much more easily, and there&amp;rsquo;s no need to create new variables each step of the way. Pipes take us smoothly from beginning to end.&lt;/p>
&lt;p>This way of writing the code also lets us insert comments at each step so we can clearly document our process:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">iris &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#586e75"># first filter and keep only sepals greater than 5cm long and 3cm wide:&lt;/span>
&lt;span style="color:#268bd2">filter&lt;/span>(Sepal.Length &lt;span style="color:#719e07">&amp;gt;&lt;/span> &lt;span style="color:#2aa198">5&lt;/span> &lt;span style="color:#719e07">&amp;amp;&lt;/span> Petal.Length &lt;span style="color:#719e07">&amp;gt;&lt;/span> &lt;span style="color:#2aa198">3&lt;/span>) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#586e75"># then approximate sepal and petal area by multiplying length and width:&lt;/span>
&lt;span style="color:#268bd2">mutate&lt;/span>(Sepal.Area &lt;span style="color:#719e07">=&lt;/span> Sepal.Length &lt;span style="color:#719e07">*&lt;/span> Sepal.Width,
Petal.Area &lt;span style="color:#719e07">=&lt;/span> Petal.Length &lt;span style="color:#719e07">*&lt;/span> Petal.Width) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#586e75"># after that group by species to summarize the mean &lt;/span>
&lt;span style="color:#586e75"># sepal/petal area of each species:&lt;/span>
&lt;span style="color:#268bd2">group_by&lt;/span>(Species) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">summarize&lt;/span>(avg.sepal.area &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">mean&lt;/span>(Sepal.Area),
avg.petal.area &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">mean&lt;/span>(Petal.Area))
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 2 × 3
## Species avg.sepal.area avg.petal.area
## &amp;lt;fct&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;
## 1 versicolor 17.0 5.93
## 2 virginica 19.8 11.4
&lt;/code>&lt;/pre>&lt;p>All that said, I&amp;rsquo;m not suggesting that your entire R analysis script fit inside one long set of pipes. Find what works best for you and your analyses in terms of splitting up your code into neat organized chunks that make sense.&lt;/p>
&lt;p>We owe a big thank you to &lt;a href="https://stefanbache.dk/" target="_blank" rel="noopener">Stefan Milton Bache&lt;/a> (@&lt;a href="https://twitter.com/stefanbache" target="_blank" rel="noopener">stefanbache&lt;/a> on Twitter), creator of the &lt;code>magrittr&lt;/code> package and the almighty pipe! Hope you found this tutorial helpful. Happy coding!&lt;/p>
&lt;p>P.S. A highly relevant tweet explaining pipes&amp;hellip; (from &lt;a href="https://twitter.com/WeAreRLadies/status/1172576445794803713?s=20" target="_blank" rel="noopener">WeAreRLadies on Twitter&lt;/a>)
&lt;img src="https://www.rforecology.com/pipes_image2.png" alt="Image of text saying I woke up, then showered, then dressed, then glammed up, then showed up to work, with pipes instead of the word then">&lt;/p>
&lt;br>
&lt;hr>
&lt;center>If you enjoyed this tutorial and want learn more, you can check out Luka Negoita's full course on the complete basics of R for ecology here:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://coaching.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning R now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>How to use the group_by function with your ecological data</title><link>https://www.rforecology.com/post/how-to-use-the-group-by-function/</link><pubDate>Wed, 23 Feb 2022 08:45:39 -0500</pubDate><guid>https://www.rforecology.com/post/how-to-use-the-group-by-function/</guid><description>&lt;script src="https://rforecology.activehosted.com/f/embed.php?id=21" type="text/javascript" charset="utf-8">&lt;/script>
&lt;p>In scientific data and experiments, we often have groups of subjects between which we want to compare an observed response. For example, we might want to compare the growth rates of plants under different light treatments. Or maybe we want to compare CO² emissions of different countries over time. Each of these scenarios requires you to group your data based on a certain variable before you can compare any kind of statistic such as mean, minimum, or maximum.&lt;/p>
&lt;p>In this tutorial, I&amp;rsquo;m going to discuss how to use a handy function called &lt;code>group_by()&lt;/code>, which allows you to do what I just described.&lt;/p>
&lt;img src="https://www.rforecology.com/groupby_image1.png" alt="Image showing pine trees of different ages with an arrow showing that the group by function grouped them by age" style="width:400px;"/>
&lt;p>&lt;code>group_by()&lt;/code> is part of the &lt;code>dplyr&lt;/code> package, so we&amp;rsquo;ll load that up first. Remember that if you haven&amp;rsquo;t used or installed the package before, you need to run &lt;code>install.packages(&amp;quot;dplyr&amp;quot;)&lt;/code> before loading it in your script. Let&amp;rsquo;s also load up a data set that comes with R, called &lt;code>Loblolly&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Load package&lt;/span>
&lt;span style="color:#268bd2">library&lt;/span>(dplyr)
&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Load data&lt;/span>
&lt;span style="color:#268bd2">data&lt;/span>(Loblolly)
&lt;span style="color:#586e75"># View data&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(Loblolly)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## height age Seed
## 1 4.51 3 301
## 15 10.89 5 301
## 29 28.72 10 301
## 43 41.74 15 301
## 57 52.70 20 301
## 71 60.92 25 301
&lt;/code>&lt;/pre>&lt;p>&lt;code>Loblolly&lt;/code> describes the height of Loblolly pine trees at different ages. &amp;ldquo;Height&amp;rdquo; is given in feet, &amp;ldquo;age&amp;rdquo; is given in years, and &amp;ldquo;seed&amp;rdquo; is a unique identifier for each tree.&lt;/p>
&lt;h3 id="how-to-use-group_by-and-summarise">How to use group_by() and summarise()&lt;/h3>
&lt;p>Let&amp;rsquo;s say we want to see the average height of loblolly pine trees within each of the age groups. To do that, we need to group our data by the variable &amp;ldquo;age&amp;rdquo;. We use the &lt;code>group_by()&lt;/code> function like this: &lt;code>group_by(data, column)&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Group the Loblolly data by tree age&lt;/span>
&lt;span style="color:#268bd2">group_by&lt;/span>(Loblolly, age)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 84 × 3
## # Groups: age [6]
## height age Seed
## &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;ord&amp;gt;
## 1 4.51 3 301
## 2 10.9 5 301
## 3 28.7 10 301
## 4 41.7 15 301
## 5 52.7 20 301
## 6 60.9 25 301
## 7 4.55 3 303
## 8 10.9 5 303
## 9 29.1 10 303
## 10 42.8 15 303
## # … with 74 more rows
&lt;/code>&lt;/pre>&lt;p>When we do this, our data look the same. But behind the scenes, R makes note of how we want to group our data and returns a table that is grouped accordingly. In fact, our data look the same aside from the &lt;code>Groups: age [6]&lt;/code> labeled at the top of the table. However, after grouping the data, we can now apply functions that calculate summary statistics within each group using the function &lt;code>summarize()&lt;/code>, or &lt;code>summarise()&lt;/code> (the spelling depends on if you use British or American English).&lt;/p>
&lt;p>&lt;code>summarise()&lt;/code> can be used like so: &lt;code>summarise(data, new_column_name = function(column_to_evaluate))&lt;/code>.&lt;/p>
&lt;p>So if we wanted to summarize mean heights of trees, it would look like &lt;code>summarise(Loblolly, avgheight = mean(height))&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Group the Loblolly data by tree age and then summarize the mean, min, and max heights in each group&lt;/span>
&lt;span style="color:#268bd2">group_by&lt;/span>(Loblolly, age) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">summarise&lt;/span>(avgheight &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">mean&lt;/span>(height),
minheight &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">min&lt;/span>(height),
maxheight &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">max&lt;/span>(height))
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 6 × 4
## age avgheight minheight maxheight
## &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;
## 1 3 4.24 3.46 4.81
## 2 5 10.2 9.03 11.4
## 3 10 27.4 25.4 30.2
## 4 15 40.5 37.8 44.4
## 5 20 51.5 48.3 55.8
## 6 25 60.3 56.4 64.1
&lt;/code>&lt;/pre>&lt;p>In essence, &lt;code>summarise()&lt;/code> produces a new table that contains a column for your group, and then new columns of summary statistics that you define. In the code above, I asked &lt;code>summarise()&lt;/code> to create new columns called &amp;ldquo;avgheight&amp;rdquo; for the mean height of trees in each age group, &amp;ldquo;minheight&amp;rdquo; for the minimum, and &amp;ldquo;maxheight&amp;rdquo; for the maximum. After we summarize our data, &lt;code>dplyr&lt;/code> will also automatically ungroup our output.&lt;/p>
&lt;p>You might be wondering about this guy &lt;code>%&amp;gt;%&lt;/code> in the code above. This operator is called a pipe, and it comes loaded with the &lt;code>dplyr&lt;/code> package. Importantly, this pipe doesn&amp;rsquo;t come with base R. For now, what you need to know about pipes are that they feed the output of one statement into the input of another. In the code above, the new table that came out of &lt;code>group_by()&lt;/code> was passed into the &lt;code>data&lt;/code> argument of &lt;code>summarise()&lt;/code>, so there was no need for me to write &lt;code>data = Loblolly&lt;/code> in the &lt;code>summarise()&lt;/code> function. In plain English, I asked the code to &amp;ldquo;group the Loblolly data by tree age, &lt;em>and then&lt;/em> (pipe!) summarize those groups using their mean, max, and min&amp;rdquo;.&lt;/p>
&lt;p>Pipes can make your code a lot cleaner, especially if you&amp;rsquo;re performing several operations on one data frame. Don&amp;rsquo;t worry, we have a more comprehensive tutorial post on pipes coming up soon.&lt;/p>
&lt;h3 id="group_by-and-other-dplyr-functions">group_by() and other dplyr functions&lt;/h3>
&lt;p>We just went over the &lt;code>summarise()&lt;/code> function, which is one of the most common dplyr functions to use with &lt;code>group_by()&lt;/code>. But you could also use other dplyr functions such as &lt;code>mutate()&lt;/code> and &lt;code>filter()&lt;/code>.&lt;/p>
&lt;h4 id="mutate">mutate()&lt;/h4>
&lt;p>For example, we could once again group our data by age, and then we could use &lt;code>mutate()&lt;/code> to create a new column for mean height.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Group the Loblolly data by age and create a new column for average height by age group&lt;/span>
&lt;span style="color:#268bd2">group_by&lt;/span>(Loblolly, age) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">mutate&lt;/span>(age_avgheight &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">mean&lt;/span>(height))
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 84 × 4
## # Groups: age [6]
## height age Seed age_avgheight
## &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;ord&amp;gt; &amp;lt;dbl&amp;gt;
## 1 4.51 3 301 4.24
## 2 10.9 5 301 10.2
## 3 28.7 10 301 27.4
## 4 41.7 15 301 40.5
## 5 52.7 20 301 51.5
## 6 60.9 25 301 60.3
## 7 4.55 3 303 4.24
## 8 10.9 5 303 10.2
## 9 29.1 10 303 27.4
## 10 42.8 15 303 40.5
## # … with 74 more rows
&lt;/code>&lt;/pre>&lt;p>This essentially did the same thing as &lt;code>summarise()&lt;/code>, but instead of creating a new table, &lt;code>mutate()&lt;/code> just added this &amp;ldquo;age_avgheight&amp;rdquo; column to the original data set. You can see that for trees of the same age, the &amp;ldquo;age_avgheight&amp;rdquo; value is the same. This makes sense, since we grouped the data by age before taking the mean, and there should only be one mean height for each age group.&lt;/p>
&lt;p>For functions like &lt;code>mutate()&lt;/code> and &lt;code>filter()&lt;/code> where we might want to keep working on the same data set afterwards, we need to &lt;code>ungroup()&lt;/code> the data after grouping it so that the grouping doesn&amp;rsquo;t affect other functions down the line. I&amp;rsquo;ll demonstrate quickly:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Demonstrating ungrouping data and mutating a new column for average height&lt;/span>
&lt;span style="color:#268bd2">group_by&lt;/span>(Loblolly, age) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">mutate&lt;/span>(age_avgheight &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">mean&lt;/span>(height)) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">ungroup&lt;/span>() &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">mutate&lt;/span>(all_avgheight &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">mean&lt;/span>(height))
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 84 × 5
## height age Seed age_avgheight all_avgheight
## &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;ord&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;
## 1 4.51 3 301 4.24 32.4
## 2 10.9 5 301 10.2 32.4
## 3 28.7 10 301 27.4 32.4
## 4 41.7 15 301 40.5 32.4
## 5 52.7 20 301 51.5 32.4
## 6 60.9 25 301 60.3 32.4
## 7 4.55 3 303 4.24 32.4
## 8 10.9 5 303 10.2 32.4
## 9 29.1 10 303 27.4 32.4
## 10 42.8 15 303 40.5 32.4
## # … with 74 more rows
&lt;/code>&lt;/pre>&lt;p>After I ungrouped the data, I used &lt;code>mutate()&lt;/code> to create a new column for average height again. But this time, because the data is ungrouped, the &amp;ldquo;all_avgheight&amp;rdquo; column just contains the average height of all trees in the data set rather than by age group.&lt;/p>
&lt;h4 id="filter">filter()&lt;/h4>
&lt;p>For the &lt;code>filter()&lt;/code> example, I&amp;rsquo;m going to remove a few rows of data from the Loblolly data set so that we can more clearly see the effect of the filter. If you want to follow along, you can copy and paste the following code into your script:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Remove some rows at random (sort of)&lt;/span>
Loblolly &lt;span style="color:#719e07">&amp;lt;-&lt;/span> Loblolly[&lt;span style="color:#719e07">-&lt;/span>&lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1&lt;/span>, &lt;span style="color:#2aa198">2&lt;/span>, &lt;span style="color:#2aa198">3&lt;/span>, &lt;span style="color:#2aa198">4&lt;/span>, &lt;span style="color:#2aa198">9&lt;/span>, &lt;span style="color:#2aa198">10&lt;/span>, &lt;span style="color:#2aa198">11&lt;/span>, &lt;span style="color:#2aa198">17&lt;/span>, &lt;span style="color:#2aa198">18&lt;/span>, &lt;span style="color:#2aa198">22&lt;/span>, &lt;span style="color:#2aa198">29&lt;/span>, &lt;span style="color:#2aa198">30&lt;/span>, &lt;span style="color:#2aa198">34&lt;/span>, &lt;span style="color:#2aa198">35&lt;/span>, &lt;span style="color:#2aa198">47&lt;/span>, &lt;span style="color:#2aa198">55&lt;/span>, &lt;span style="color:#2aa198">56&lt;/span>, &lt;span style="color:#2aa198">70&lt;/span>, &lt;span style="color:#2aa198">82&lt;/span>, &lt;span style="color:#2aa198">83&lt;/span>), ]
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Now let&amp;rsquo;s see how to use &lt;code>filter()&lt;/code> with &lt;code>group_by()&lt;/code>. In our data set, we have 6 age classes for each tree: 3, 5, 10, 15, and 25. But because I removed several rows of data, we are now missing age data for some trees (e.g., for trees 301 and 303).&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Look at age classes&lt;/span>
&lt;span style="color:#268bd2">sort&lt;/span>(&lt;span style="color:#268bd2">unique&lt;/span>(Loblolly&lt;span style="color:#719e07">$&lt;/span>age))
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 3 5 10 15 20 25
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View modified data&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(Loblolly, &lt;span style="color:#2aa198">10&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## height age Seed
## 57 52.70 20 301
## 71 60.92 25 301
## 2 4.55 3 303
## 16 10.92 5 303
## 72 63.39 25 303
## 3 4.79 3 305
## 17 11.37 5 305
## 31 30.21 10 305
## 45 44.40 15 305
## 4 3.91 3 307
&lt;/code>&lt;/pre>&lt;p>Let&amp;rsquo;s say our data analysis requires that we have at least 5 age classes for each tree. In that case, we&amp;rsquo;ll have to eliminate all trees for which there are fewer than 5 ages. We can use &lt;code>group_by()&lt;/code> to group by Seed (the individual tree), then use &lt;code>filter()&lt;/code> to only include data that are in a group of at least 5. The function &lt;code>n()&lt;/code> will help us count the number of rows in each group.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Filtering to include groups of at least 5&lt;/span>
&lt;span style="color:#268bd2">group_by&lt;/span>(Loblolly, Seed) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">filter&lt;/span>(&lt;span style="color:#268bd2">n&lt;/span>() &lt;span style="color:#719e07">&amp;gt;=&lt;/span> &lt;span style="color:#2aa198">5&lt;/span>) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">ungroup&lt;/span>()
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 39 × 3
## height age Seed
## &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;ord&amp;gt;
## 1 3.91 3 307
## 2 9.48 5 307
## 3 25.7 10 307
## 4 50.8 20 307
## 5 59.1 25 307
## 6 4.32 3 315
## 7 10.4 5 315
## 8 27.2 10 315
## 9 40.8 15 315
## 10 51.3 20 315
## # … with 29 more rows
&lt;/code>&lt;/pre>&lt;p>We see that the data set is greatly reduced, and trees like 301 and 303 have been removed because they have fewer than 5 age classes. We can also run the opposite filter and only include data that are in a group of less than 5.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Filtering to include groups of less than 5&lt;/span>
&lt;span style="color:#268bd2">group_by&lt;/span>(Loblolly, Seed) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">filter&lt;/span>(&lt;span style="color:#268bd2">n&lt;/span>() &lt;span style="color:#719e07">&amp;lt;&lt;/span> &lt;span style="color:#2aa198">5&lt;/span>) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">ungroup&lt;/span>()
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 25 × 3
## height age Seed
## &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;ord&amp;gt;
## 1 52.7 20 301
## 2 60.9 25 301
## 3 4.55 3 303
## 4 10.9 5 303
## 5 63.4 25 303
## 6 4.79 3 305
## 7 11.4 5 305
## 8 30.2 10 305
## 9 44.4 15 305
## 10 4.81 3 309
## # … with 15 more rows
&lt;/code>&lt;/pre>&lt;p>Great! Now you&amp;rsquo;ve learned how to use the &lt;code>group_by()&lt;/code> function along with several of the main &lt;code>dplyr&lt;/code> functions &lt;code>summarise()&lt;/code>, &lt;code>mutate()&lt;/code>, and &lt;code>filter()&lt;/code>. I covered just a few ways you might use these functions; it&amp;rsquo;s up to you to play around with them and learn even more. And don&amp;rsquo;t forget to use &lt;code>ungroup()&lt;/code>!&lt;/p>
&lt;br>
&lt;hr>
&lt;center>If you want learn more about data wrangling with dplyr functions, you can check out our full course on the complete basics of R for ecology here:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://www.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning R now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>How to use R Markdown (part two): for learning R!</title><link>https://www.rforecology.com/post/how-to-use-rmarkdown-part-two/</link><pubDate>Tue, 15 Feb 2022 10:45:39 +0000</pubDate><guid>https://www.rforecology.com/post/how-to-use-rmarkdown-part-two/</guid><description>&lt;script src="https://rforecology.activehosted.com/f/embed.php?id=21" type="text/javascript" charset="utf-8">&lt;/script>
&lt;p>Welcome to part two of my blog series on R Markdown. &lt;a href="https://www.rforecology.com/post/how-to-use-rmarkdown-part-one/" target="_blank" rel="noopener">In the first part&lt;/a>, I went over how to create a basic R Markdown document and how to use R Markdown syntax. In this post, I&amp;rsquo;m going to talk about how you can use R Markdown to learn R.&lt;/p>
&lt;img src="https://www.rforecology.com/rmd2_featured2.png" alt="Left side shows the R Markdown logo with an arrow to the right pointing to the R logo. Behind this there is a faded image of someone studying from a book that covers their face." style="width:500px;"/>
&lt;p>So why is R Markdown good for learning R? As you saw in the first post, R Markdown is a method for typing normal and formatted text alongside your R code and its outputs. This is perfect for documenting your analyses by taking notes on specific chunks of code and writing down what worked or didn&amp;rsquo;t work. This same process is perfect for creating tutorials (like the one here!) and keeping track of what you learn. Eventually, the end goal is to have a series of R Markdown documents that cover all the topics and code that you learn, which include both the code and notes explaining what everything does. These documents then also serve as a guide that you can refer back to for troubleshooting or jogging your memory.&lt;/p>
&lt;p>In other words, using R Markdown to learn R allows you to:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>Have a &lt;strong>project-based learning experience&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Fully document your learning&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Create a reference&lt;/strong> that you can look back on in the future if you get stuck or can&amp;rsquo;t remember something&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Learn by teaching&lt;/strong> because you&amp;rsquo;re explaining things in your own words and taking notes&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Create a &lt;strong>teaching resource&lt;/strong> for yourself that you can then use to help others as well&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>You can also follow along with this post as a video if you click on the image below. Start the video at 35:35 to cover the material in this post.&lt;/p>
&lt;a href="https://youtu.be/K418swtFnik" target="_blank">
&lt;img src="https://www.rforecology.com/rmd_image1.png" alt="Image of Youtube play button on top of a tutorial for R Markdown created by R for Ecology" style="width:400px;"/>
&lt;/a>
&lt;h2 id="getting-set-up">Getting set up&lt;/h2>
&lt;p>To use R Markdown, you&amp;rsquo;ll need to have R and RStudio already installed. If you need help with downloading R and RStudio, you can check out my &lt;a href="https://www.rforecology.com/post/how-to-install-r-and-rstudio/" target="_blank" rel="noopener">blog post&lt;/a> and lessons &lt;a href="https://youtu.be/YKvkXKeGoa8" target="_blank" rel="noopener">one&lt;/a>, &lt;a href="https://youtu.be/dPLbyWXEG_E" target="_blank" rel="noopener">two&lt;/a>, and &lt;a href="https://youtu.be/dYOs0Qn616s" target="_blank" rel="noopener">three&lt;/a> of my online course.&lt;/p>
&lt;p>You&amp;rsquo;ll also have to install two packages: &lt;code>rmarkdown&lt;/code> and &lt;code>knitr&lt;/code>. To do that you can run &lt;code>install.packages(&amp;quot;rmarkdown&amp;quot;)&lt;/code> and &lt;code>install.packages(&amp;quot;knitr&amp;quot;)&lt;/code>. You&amp;rsquo;ll only need to do this once for your computer (at least until the next time you update R).&lt;/p>
&lt;p>If you are completely new to R &lt;em>and&lt;/em> R Markdown, then I strongly suggest you &lt;a href="https://www.rforecology.com/post/how-to-use-rmarkdown-part-one/" target="_blank" rel="noopener">start with my previous blog post on how to use R Markdown&lt;/a> (after going through the three lessons linked above). It goes through all the most important tools in R Markdown.&lt;/p>
&lt;p>With the basic software and packages installed, the first thing is to create a new RStudio project where you&amp;rsquo;ll be working on your R Markdown documents. RStudio projects are &lt;em>incredibly&lt;/em> helpful for file organization and managing working directories, and they remove the need to use functions like &lt;code>setwd()&lt;/code> and &lt;code>getwd()&lt;/code>. You can read more about RStudio projects in our post on the subject &lt;a href="https://www.rforecology.com/post/organizing-your-r-studio-projects/" target="_blank" rel="noopener">here.&lt;/a>&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
&lt;p>To learn more about RStudio Projects and why you should always use them, check out these three other great posts from other blogs:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;a href="https://chrisvoncsefalvay.com/structuring-r-projects/" target="_blank">Structuring R Projects&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://kkulma.github.io/2018-03-18-Prime-Hints-for-Running-a-data-project-in-R/" target="_blank">Prime Hints for Running a data project in R&lt;/a>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;a href="https://ntguardian.wordpress.com/2018/08/02/how-should-i-organize-my-r-research-projects/" target="_blank">How should I organize my R research projects&lt;/a>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;/div>
&lt;/div>
&lt;p>Go to &amp;ldquo;File&amp;rdquo; and click on &amp;ldquo;New Project&amp;hellip;&amp;rdquo;.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd2_image2.png" alt="Image showing cursor hovering over File and New Project">&lt;/p>
&lt;p>This will open a new window, where you&amp;rsquo;ll click on &amp;ldquo;New Directory&amp;rdquo; and &amp;ldquo;New Project&amp;rdquo;. That should take you to the next window, where you can give your project a name, like &amp;ldquo;Learning R&amp;rdquo; and then can choose somewhere to save it. Then hit &amp;ldquo;Create Project&amp;rdquo;, and you&amp;rsquo;ll have a new RStudio project.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd2_image3.png" alt="Image of Create New Project window, where you can give your project a name and save it somewhere">&lt;/p>
&lt;p>Now that you&amp;rsquo;ve created your new project, anytime you want to work on it, all you have to do is just open the &amp;lsquo;.Rproj&amp;rsquo; file, and RStudio will open up with the scripts you were working on (R Markdown documents in this case).&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd2_image4.png" alt="Image showing cursor hovering over new .Rproj file">&lt;/p>
&lt;p>With this R project open, go to &amp;ldquo;File&amp;rdquo; &amp;raquo; &amp;ldquo;New File&amp;rdquo; and click on &amp;ldquo;R Markdown&amp;hellip;&amp;rdquo;&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd2_image5.png" alt="Image showing cursor hovering over File and R Markdown">&lt;/p>
&lt;p>Give your R Markdown document a title and hit OK. I titled my document &amp;ldquo;The basics&amp;rdquo;.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd2_image6.png" alt="Image showing new R Markdown document titled &amp;ldquo;The basics&amp;rdquo;">&lt;/p>
&lt;h2 id="organizing-your-documents">Organizing your documents&lt;/h2>
&lt;p>One method is to create new R Markdown documents for every topic you cover. You might start off with one document that covers the basics, then the next one might cover how to upload data, and then you&amp;rsquo;ll have another one for data visualization, etc.&lt;/p>
&lt;p>If you still want to create one large R Markdown document, or if you have sub-topics within your larger topics, you can add a table of contents to your document. Do this by going to the gear button at the top of the document and clicking on &amp;ldquo;Output Options&amp;rdquo;.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd2_image7.png" alt="Image showing cursor hovering over the gear button and output options">&lt;/p>
&lt;p>From there, a window will open up and you can check the box &amp;ldquo;Include table of contents&amp;rdquo;.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd2_image8.png" alt="Image with the option &amp;ldquo;include table of contents&amp;rdquo; selected and circled">&lt;/p>
&lt;p>After that, you&amp;rsquo;ll notice that the text &amp;ldquo;toc: yes&amp;rdquo; appears at the top of your R Markdown document. When you knit your document, any headers you&amp;rsquo;ve added will appear in the table of contents at the top (like my header, &amp;ldquo;The basics&amp;rdquo;). The table of contents is clickable, so it will take you to wherever that section is in your document.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd2_image9.png" alt="Image showing a side by side comparison of the R Markdown code with the knitted R Markdown document. The knitted document has a table of contents at the top with a hyperlink that takes you to the section called &amp;ldquo;The basics&amp;rdquo;">&lt;/p>
&lt;p>You can also number the table of contents and the section headings by checking that option in the &amp;ldquo;Output Options&amp;rdquo; window.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd2_image10.png" alt="Image showing the &amp;ldquo;Number section headings&amp;rdquo; option selected and circled">&lt;/p>
&lt;p>Numbering the sections can help your document become more clearly organized. If you add subsections, the document will take that into account when numbering. For example, my section 1 is called &amp;ldquo;The basics&amp;rdquo;. I made two subsections within &amp;ldquo;The basics&amp;rdquo;, called &amp;ldquo;Defining variables&amp;rdquo; and &amp;ldquo;Vectors&amp;rdquo;. &amp;ldquo;Defining variables&amp;rdquo; is given the number 1.1 and &amp;ldquo;Vectors&amp;rdquo; is given the number 1.2 because they&amp;rsquo;re nested under section 1. You can also see that in the table of contents, the subsections are tabbed in under their umbrella section to show that they&amp;rsquo;re nested.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd2_image11.png" alt="Image of table of contents with sections and subsections. The umbrella section is called &amp;ldquo;The basics&amp;rdquo;, and there are two subsections, called &amp;ldquo;Defining variables&amp;rdquo; and &amp;ldquo;Vectors&amp;rdquo;">&lt;/p>
&lt;h2 id="dealing-with-errors">Dealing with errors&lt;/h2>
&lt;p>I want to show you one more thing that I like to do when using R Markdown for learning. Sometimes we get errors that show up, and we aren&amp;rsquo;t sure how to resolve them. For example, in the code below, I get an error that says my variable can&amp;rsquo;t be found (this is because I haven&amp;rsquo;t created a variable called &amp;ldquo;my_number&amp;rdquo; yet).&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd2_image12.png" alt="Image of code saying &amp;ldquo;is my number greater than 5?&amp;rdquo; and an error message that says &amp;ldquo;Error: object &amp;lsquo;my_number&amp;rsquo; not found&amp;rdquo;.">&lt;/p>
&lt;p>When you have an error in your code, R Markdown won&amp;rsquo;t let you knit the document unless you&amp;rsquo;ve resolved the error. One thing you could do is to delete the problematic code, but then you might make the same mistake in the future. What you can do instead is copy and paste the error message and insert it in your code chunk as a comment. Then also comment out the code that caused the error, allowing you to knit the document. Then your code chunk might look something like this.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd2_image13.png" alt="Code that says &amp;ldquo;my code made this error and I can&amp;rsquo;t figure it out yet&amp;rdquo;, with the code and error message included as comments">
Now, if you share your document with someone, they can see the error and help you resolve it. Or maybe you&amp;rsquo;ll come back to the document in the future, see your note, and figure it out on your own after leaving the code alone for a bit.&lt;/p>
&lt;h2 id="learning-workflow">Learning workflow&lt;/h2>
&lt;p>To summarize, here is a workflow that you can follow for learning R with R Markdown:&lt;/p>
&lt;ol>
&lt;li>
&lt;p>Start by creating the empty R project and your first R Markdown document (making sure to clear out the example contents of the new R Markdown document). Also make sure to add in a table of contents if you plan on keeping it all in one longer document.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Then, as you follow through any tutorials (&lt;a href="https://www.rforecology.com" target="_blank" rel="noopener">or online courses!&lt;/a> 😄), start new section headings (using &amp;lsquo;#&amp;rsquo;s) and begin explaining the steps you take to complete the tutorial or lesson.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>After each set of text or description, add in the associated R code chunk.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Knit your document often to see the changes you are making as a stand-alone document and to make sure there are no errors in your code.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Follow the steps above for dealing with errors as they come up.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Finally, refer back to your knitted HTML document as often as you need, or even print it out as a physical reference if that helps.&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>All of the tips that I included in this blog post are intended to help you document your learning process. Taking notes and analyzing your own code can help ensure that everything you&amp;rsquo;re learning is sticking in your head.&lt;/p>
&lt;p>And that&amp;rsquo;s it for my R Markdown tutorial series! I hope you enjoyed these posts. Remember to keep adding to your documents as you learn—it will help you grasp new topics and can even turn the R learning curve into a fun project!&lt;/p>
&lt;p>Have any cool R Markdown documents you&amp;rsquo;ve created? Share links in the comments below! 👇&lt;/p>
&lt;br>
&lt;hr>
&lt;center>If you liked this post and want to learn more, then check out my online course on the complete basics of R for ecology. The course is *perfectly* suited for creating your own R Markdown document as you follow along!
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://www.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning R now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>How to use R Markdown (part one)</title><link>https://www.rforecology.com/post/how-to-use-rmarkdown-part-one/</link><pubDate>Wed, 09 Feb 2022 10:45:39 -0500</pubDate><guid>https://www.rforecology.com/post/how-to-use-rmarkdown-part-one/</guid><description>&lt;script src="https://rforecology.activehosted.com/f/embed.php?id=21" type="text/javascript" charset="utf-8">&lt;/script>
&lt;p>Today I&amp;rsquo;m excited to share a blog post on how to use R Markdown. R Markdown is a dynamic file format that allows you to make documents containing normal text alongside chunks of embedded R code. In fact, all of my blog posts are written using R Markdown, which is how I&amp;rsquo;m able to write text like this, write &lt;code>code&lt;/code>, and even insert a chunk of code&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">like_this &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;isn&amp;#39;t&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;this&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;neat?&amp;#34;&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;img src="https://www.rforecology.com/rmd_image0c.png" alt="Image saying 'How to use R Markdown Part 1', with an arrow pointing from the R Markdown logo to the HTML logo. The background is a person knitting, and below the arrow it says 'KNIT!'" style="width:400px;"/>
&lt;h3 id="r-markdown-is-useful-for-several-reasons">R Markdown is useful for several reasons:&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>It&amp;rsquo;s great for &lt;strong>reproducibility&lt;/strong>, where you can explain your analyses alongside your code and output so someone can follow along and replicate your work&lt;/p>
&lt;/li>
&lt;li>
&lt;p>It helps with &lt;strong>accountability&lt;/strong>, because all your code and the &lt;em>exact corresponding outputs&lt;/em> are &lt;em>knit&lt;/em> together into the final document&lt;/p>
&lt;/li>
&lt;li>
&lt;p>It allows you to &lt;strong>make tutorials&lt;/strong> like this one&lt;/p>
&lt;/li>
&lt;li>
&lt;p>Finally, you can use it &lt;strong>for learning R&lt;/strong> by helping you keep track of your notes and thinking process all while creating a custom reference document (more on this in part two!)&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>This tutorial is the first post of a two-part series on R Markdown. Here, you&amp;rsquo;ll learn how to create R Markdown documents with different types of content, and in part two I&amp;rsquo;ll go into how you can use it for learning R.&lt;/p>
&lt;p>You can also follow along with this blog post in video format if you click on the image below. This post covers material in the video up to 35:35. I&amp;rsquo;ll cover the rest of the video in part two next week.&lt;/p>
&lt;a href="https://youtu.be/K418swtFnik" target="_blank">
&lt;img src="https://www.rforecology.com/rmd_image1.png" alt="Image of Youtube play button on top of a tutorial for R Markdown created by R for Ecology" style="width:400px;"/>
&lt;/a>
&lt;h2 id="getting-set-up-with-r-markdown">Getting set up with R Markdown&lt;/h2>
&lt;p>To use R Markdown, you&amp;rsquo;ll need to have R and RStudio already installed. If you need help with that, you can check out my &lt;a href="https://www.rforecology.com/post/how-to-install-r-and-rstudio/" target="_blank" rel="noopener">blog post&lt;/a> and lessons &lt;a href="https://youtu.be/YKvkXKeGoa8" target="_blank" rel="noopener">one&lt;/a>, &lt;a href="https://youtu.be/dPLbyWXEG_E" target="_blank" rel="noopener">two&lt;/a>, and &lt;a href="https://youtu.be/dYOs0Qn616s" target="_blank" rel="noopener">three&lt;/a> of my online course. These resources show you how to get started with R and RStudio.&lt;/p>
&lt;p>You&amp;rsquo;ll also have to install two packages: &lt;code>rmarkdown&lt;/code> and &lt;code>knitr&lt;/code>. To do that you can run &lt;code>install.packages(&amp;quot;rmarkdown&amp;quot;)&lt;/code> and &lt;code>install.packages(&amp;quot;knitr&amp;quot;)&lt;/code>. You&amp;rsquo;ll only need to do this once for your computer (at least until the next time you update R).&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">install.packages&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;rmarkdown&amp;#34;&lt;/span>)
&lt;span style="color:#268bd2">install.packages&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;knitr&amp;#34;&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Now that you have the basic software and packages installed, you can get started with using R Markdown!&lt;/p>
&lt;p>The first thing you&amp;rsquo;ll do after opening RStudio is go to File &amp;raquo; New File &amp;raquo; R Markdown.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image2.png" alt="Image showing cursor hovering over File, New File, and R Markdown">&lt;/p>
&lt;p>Then a new window will pop up where you can fill out the title of your new document, the author (um, your name? 😉), and the output format. You can choose between HTML, PDF, and Word. We&amp;rsquo;re going to choose HTML for now, since that&amp;rsquo;s the simplest option and all you really need. Then hit OK.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image3.png" alt="Image showing window where the title is &amp;ldquo;R Markdown Tutorial&amp;rdquo;, the author is &amp;ldquo;Luka Negoita&amp;rdquo;, and HTML is selected as the output">&lt;/p>
&lt;p>You&amp;rsquo;ll now have a new document that is filled in with a bunch of example content. At the top, in between the boundary lines (&lt;code>---&lt;/code>), you&amp;rsquo;ll see a list of document parameters that should reflect what you entered in the previous window (title, author, output). You don&amp;rsquo;t need to change anything there. We&amp;rsquo;ll come back to this header section later.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image4.png" alt="Image showing new R Markdown document that was created, with a bunch of example text">&lt;/p>
&lt;p>The next section shows a code chunk that says &amp;ldquo;r setup&amp;rdquo;. This sets a bunch of code chunk parameters for the rest of the document. There&amp;rsquo;s no real need to include this, so we&amp;rsquo;ll delete it for now.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image5.png" alt="Image showing code chunk in R with global parameters">&lt;/p>
&lt;p>What you see now is the raw R Markdown content, which contains chunks of R code between chunks of regular text with Markdown formatting. But the true power of R Markdown is when you transform that text and code into a stand-alone document.&lt;/p>
&lt;p>So how do we get from an R Markdown document in RStudio to the HTML document? You click on the &amp;ldquo;&lt;strong>Knit&lt;/strong>&amp;rdquo; button (no need to click on the dropdown arrow). The lingo here is that the R Markdown document &amp;ldquo;knits&amp;rdquo; itself into an HTML document.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image6.png" alt="Highlighting the Knit button in the RStudio toolbar at the top of the screen">&lt;/p>
&lt;p>If you haven&amp;rsquo;t saved your document yet, you&amp;rsquo;ll be prompted to save it when you click &amp;ldquo;Knit&amp;rdquo;. You&amp;rsquo;ll notice that R Markdown files are saved as a .Rmd file instead of a .R file. Now that you&amp;rsquo;ve saved your document somewhere, it will automatically save itself every time you press &amp;ldquo;Knit&amp;rdquo; from now on.&lt;/p>
&lt;p>The final HTML file will automatically display in the &amp;lsquo;Viewer&amp;rsquo; panel (usually on the right).&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image7.png" alt="Side by side comparison of R Markdown code with the HTML output">&lt;/p>
&lt;p>If you explore the document a little, you can see that R Markdown really lets you do a lot. You can include different types of text formats, &lt;a href="https://www.rforecology.com/" target="_blank" rel="noopener">links&lt;/a>, code chunks, and even plots.&lt;/p>
&lt;h2 id="working-on-your-own-r-markdown-document">Working on your own R Markdown document&lt;/h2>
&lt;p>Cool. Now let&amp;rsquo;s get started on creating our own R Markdown document. First let&amp;rsquo;s start with a blank slate. Go ahead and delete everything in the sample document so that all you have left is the parameter header. It&amp;rsquo;s important that you leave that in!&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image8.png" alt="Blank R Markdown document except for the header">&lt;/p>
&lt;p>Try to type some text, whatever you want. If you press &amp;ldquo;Knit&amp;rdquo;, it should then show up in a knitted document, and your .Rmd file should be automatically saved. Anything you type will show up in the knitted document! Neat.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image9.png" alt="Typing some text in the R Markdown document that says Hi everyone!">&lt;/p>
&lt;h3 id="learning-r-markdown-text-formats">Learning R Markdown text formats&lt;/h3>
&lt;p>Now let&amp;rsquo;s explore different types of text formatting in R Markdown. To organize different sections of your report, you&amp;rsquo;ll want to add section headings or titles. You can write headings of different sizes by writing different numbers of pound signs (#) + a space + your text, like this:&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image10.png" alt="Typing example headings of several sizes">&lt;/p>
&lt;p>As you can see, the more pound signs you add, the smaller the headings get.&lt;/p>
&lt;p>You can also add bold and italicized text by surrounding text with asterisks (*). Using one asterisk gives you italicized text, using two gives you bold text, and using three gives you bold italicized text.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image11.png" alt="Typing example italicized, bold, and bold italicized text">&lt;/p>
&lt;p>You can create numbered lists just using 1. 2. 3. (&amp;hellip;) in front of your text.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image12.png" alt="An example list in R Markdown that says this is a list!">&lt;/p>
&lt;p>And you can create bulleted lists using either hyphens (-) or asterisks (*) before your text.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image13.png" alt="Example bullet points using hyphens and asterisks. They look the same in the knitted document.">&lt;/p>
&lt;p>You can add links by putting square brackets [these] around the word or phrase that you want to hyperlink, and then immediately put the link (with the https://) in parentheses after the square brackets, like this:&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image14.png" alt="Example of putting a hyperlink in your document. The text says here is a link to my website, with a link attached to the word here">&lt;/p>
&lt;p>Lastly, if you&amp;rsquo;re already an HTML wiz, you can also add any kind of HTML code to your document since the final document is HTML anyway, but I&amp;rsquo;m going to keep this tutorial simple and let you experiment with HTML on your own.&lt;/p>
&lt;p>RStudio has an excellent &lt;a href="https://www.rstudio.com/wp-content/uploads/2015/02/rmarkdown-cheatsheet.pdf" target="_blank" rel="noopener">cheat sheet&lt;/a> that you can check out if you&amp;rsquo;re interested in learning more about what you can do with R Markdown. I just wanted to cover the essential features here, which is all you really need to know for creating most reports.&lt;/p>
&lt;h3 id="learning-how-to-embed-code-in-r-markdown">Learning how to embed code in R Markdown&lt;/h3>
&lt;p>Now that we&amp;rsquo;ve talked about how to format the text, let&amp;rsquo;s move on to embedding &lt;code>code&lt;/code>!&lt;/p>
&lt;p>You can add a code chunk by clicking on this button in the toolbar at the top of your screen:&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image15.png" alt="Image showing circled button with a C on it at the top of the screen">&lt;/p>
&lt;p>That will add a code chunk, which looks like this:&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image16.png" alt="Image showing a code chunk">&lt;/p>
&lt;p>You could also type out the code chunk boundaries yourself instead of pressing the button at the top, if you want. Those single quotes aren&amp;rsquo;t normal quotes—they&amp;rsquo;re the quote symbol ( ` ) that&amp;rsquo;s located under the escape key on a standard U.S. keyboard, usually paired with the tilde (~).&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image17.jpeg" alt="Image showing the quote symbol above the left tab and before the one key">&lt;/p>
&lt;p>You can also use these quotes to casually embed code in your text by using them like normal quotation marks, &lt;code>like this&lt;/code>. This will turn text into a little code snippet in the middle of your sentence.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image20.png" alt="Image of the words like this surrounded by specialized single quotes">&lt;/p>
&lt;p>Back to the code chunk. If you type within the code chunk, whatever you type will appear as if you are typing it in a normal R script. Then your output will look like a normal output in your console or plot viewer. I wrote a comment in my code chunk in the image below, but the cool thing about R Markdown is that you can put most of the commentary in your R Markdown text, so there&amp;rsquo;s no need to clutter the actual code with long explanation comments.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image18.png" alt="Example code with a comment saying &amp;ldquo;You can add comments by using the pound sign in R, which is good for commentary or text but not code">&lt;/p>
&lt;p>I&amp;rsquo;m going to embed some code now in the current R Markdown document (the one that I&amp;rsquo;m writing this blog post with). I created a variable called &amp;ldquo;answer&amp;rdquo; and loaded a data set called &amp;ldquo;cars&amp;rdquo; that comes with R. R actually comes with a whole bunch of premade data sets that you can look at if you type &lt;code>data()&lt;/code> into the console.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># I&amp;#39;m writing some code here:&lt;/span>
answer &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#2aa198">2&lt;/span> &lt;span style="color:#719e07">+&lt;/span> &lt;span style="color:#2aa198">4&lt;/span>
&lt;span style="color:#586e75"># View the answer&lt;/span>
answer
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 6
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Let&amp;#39;s load some data&lt;/span>
my_data &lt;span style="color:#719e07">&amp;lt;-&lt;/span> cars
&lt;span style="color:#586e75"># View the first few rows of my data&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(my_data, &lt;span style="color:#2aa198">3&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## speed dist
## 1 4 2
## 2 4 10
## 3 7 4
&lt;/code>&lt;/pre>&lt;p>You&amp;rsquo;ll notice that R Markdown has split up the code chunk into different boxes each time there&amp;rsquo;s a piece of code that prints an output. The code is contained within light grey boxes, and the output is printed in white boxes. This just helps keep things organized so you can see what output goes with what code.&lt;/p>
&lt;p>Let&amp;rsquo;s briefly explore another, related element of R Markdown: displaying plots. We&amp;rsquo;ll plot car speed as a function of distance (Y as a function of X).&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Plotting speed vs. distance from the cars data set&lt;/span>
&lt;span style="color:#268bd2">plot&lt;/span>(my_data&lt;span style="color:#719e07">$&lt;/span>speed &lt;span style="color:#719e07">~&lt;/span> my_data&lt;span style="color:#719e07">$&lt;/span>dist)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/how-to-use-rmarkdown-part-one/index_files/figure-html/unnamed-chunk-4-1.png" width="672" />&lt;/p>
&lt;p>Awesome! We can see our code and our plot output.&lt;/p>
&lt;h3 id="running-code-in-r-markdown">Running code in R Markdown&lt;/h3>
&lt;p>I&amp;rsquo;ve been pressing the &amp;ldquo;Knit&amp;rdquo; button to see the output of my code, but you can also run your code in RStudio as if you&amp;rsquo;re doing it in a normal R script. You can just put your cursor wherever you want and then press command + return on a Mac, or control + Enter on a PC.&lt;/p>
&lt;p>As you run the code in R Markdown, the output will appear below your code chunk:&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image19.png" alt="Code output appearing below the code chunk in RStudio">&lt;/p>
&lt;p>Running your code within the code chunk first (instead of knitting it) is especially useful if you want to work through any errors, since the error messages will be easier to understand.&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
&lt;p>Note: if you&amp;rsquo;re running code directly within code chunks, it&amp;rsquo;s important to note that like a normal R script, you have to run all of the code in the correct order. This will ensure that all your variables and packages are loaded when you need them later on in your code.&lt;/p>
&lt;p>For example, we refer to a variable called &lt;code>my_data&lt;/code> in our plot code. If we&amp;rsquo;re running our code manually and try to run the plot code without creating the &lt;code>my_data&lt;/code> variable first, we&amp;rsquo;re going to get an error. We have to run &lt;code>my_data &amp;lt;- cars&lt;/code> before we run &lt;code>plot(my_data$speed ~ my_data$dist)&lt;/code> for the code to work. &lt;strong>Luckily, you don&amp;rsquo;t have to worry about this when you&amp;rsquo;re knitting your document because knitting runs all of the code in order for you.&lt;/strong>&lt;/p>
&lt;/div>
&lt;/div>
&lt;p>There&amp;rsquo;s also a neat trick you can use to make sure you&amp;rsquo;ve run all the necessary code and prevent errors. Pressing the button in the image below will run all of the code up to the chunk that you&amp;rsquo;re on, so you don&amp;rsquo;t have to manually go line by line or chunk by chunk.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image21.png" alt="Image showing down arrow button that will run all chunks above the current one">&lt;/p>
&lt;h3 id="changing-your-r-markdown-theme">Changing your R Markdown theme&lt;/h3>
&lt;p>One last thing you can do to make your document look nice is to change the theme. You can click on the gear icon in the toolbar at the top, and select &amp;ldquo;Output Options&amp;hellip;&amp;rdquo;&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image22.png" alt="Image showing dropdown list from the gear icon, with output options highlighted">&lt;/p>
&lt;p>A window will open up, from which you can do things like change the theme of your HTML document. If you go to &amp;ldquo;Apply theme&amp;rdquo; and select the dropdown menu, you&amp;rsquo;re given a list of different themes to choose from. Changing the theme will do things like change the fonts and colors that are displayed. You can play around with the themes to see what you prefer, just remember to press &amp;ldquo;Knit&amp;rdquo; to process the theme change.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image23.png" alt="window showing different options for changing the look of your final R Markdown document, like theme and table of contents">&lt;/p>
&lt;p>If you&amp;rsquo;re interested in HTML and CSS, you can also apply your own CSS file to change the style of the document. Again, we&amp;rsquo;re going to keep it simple in this blog post—you can explore CSS on your own but please comment down below if you have any cool style sheets/themes for your R Markdown documents.&lt;/p>
&lt;p>In the same way that you can change the theme of your document, you can also change the syntax highlighting. That changes how your code looks when it&amp;rsquo;s embedded in the document. For example, the image below shows the &amp;ldquo;zenburn&amp;rdquo; option.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image24.png" alt="Image showing embedded code with a brown background and green commented text, different from the usual black text on light grey background">&lt;/p>
&lt;p>Now it&amp;rsquo;s time for what I think is the most useful addition. The &amp;ldquo;Output Options&amp;rdquo; window also allows you to include an interactive, clickable table of contents for your document. This is especially useful for larger documents with multiple sections. The table of contents in your document will be based off of the different headings that you use, with smaller heading sizes nested within larger ones.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image26.png" alt="Image showing that a table of contents has been added, where each level is a link to the section. ">&lt;/p>
&lt;p>You&amp;rsquo;ll notice that headings 4, 5, and 6 aren&amp;rsquo;t included in the table of contents. You can change this if you want in the &amp;ldquo;Output Options&amp;rdquo; window, where it says &amp;ldquo;depth of headers for table of contents&amp;rdquo;. If you set the depth of the headings to 6, then the table of contents will display headings all the way up to heading level 6.&lt;/p>
&lt;p>Once you make these changes, you&amp;rsquo;ll notice that these changes have also been added to the heading section of your document. This means that once you familiarize yourself with the themes, you can type this information into the heading yourself. I showed you how to do it via &amp;ldquo;Output Options&amp;rdquo; because we didn&amp;rsquo;t know what the different themes were called, nor what our options were.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/rmd_image25.png" alt="Text has been added to the output section of the header, showing that my theme is called readable, and my highlight is called zenburn">&lt;/p>
&lt;p>One quick pointer for creating an R Markdown document is to end your document with the code &lt;code>sessionInfo()&lt;/code>. This will show the information about your current R session, including the version of R you&amp;rsquo;re using, the operating system, and the packages you have loaded up. The reason this is important is because packages and software get updated over time and things can change. Certain aspects of your code might not work in the same way in the future, depending on what versions of software and packages you&amp;rsquo;re using. Having that information in the future can help you track down the issues. If you know the version information for how the code was originally run, then there are ways to download older versions of R and associated packages, or at least know where the error stems from (and how to fix it in the code). In essence, including your session info can help ensure reproducibility in the future.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">sessionInfo&lt;/span>()
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## R version 4.2.2 (2022-10-31)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Big Sur ... 10.16
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] bookdown_0.31 digest_0.6.31 R6_2.5.1 lifecycle_1.0.3
## [5] jsonlite_1.8.4 magrittr_2.0.3 evaluate_0.19 highr_0.9
## [9] blogdown_1.16 stringi_1.7.8 cachem_1.0.6 rlang_1.0.6
## [13] cli_3.4.1 rstudioapi_0.14 jquerylib_0.1.4 bslib_0.4.2
## [17] vctrs_0.5.1 rmarkdown_2.19.1 tools_4.2.2 stringr_1.5.0
## [21] glue_1.6.2 xfun_0.35 yaml_2.3.6 fastmap_1.1.0
## [25] compiler_4.2.2 htmltools_0.5.4 knitr_1.41 sass_0.4.4
&lt;/code>&lt;/pre>&lt;p>And that&amp;rsquo;s it for our basic R Markdown tutorial! You learned how to create an R Markdown document, how to apply different types of text formats, how to embed code &lt;code>inline&lt;/code> or in code chunks, and how to stylize your final R Markdown document. Our next blog post will be about how to use R Markdown to learn R, so keep your eyes peeled for a Part Two.&lt;/p>
&lt;p>Have any cool R Markdown documents you&amp;rsquo;ve created? Share links in the comments below! 👇&lt;/p>
&lt;br>
&lt;hr>
&lt;center>If you liked this post and want to learn more, then check out my online course on the complete basics of R for ecology:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://www.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>Introduction to missing data (NAs) in R</title><link>https://www.rforecology.com/post/introduction-to-missing-data-in-r/</link><pubDate>Tue, 01 Feb 2022 09:45:39 -0500</pubDate><guid>https://www.rforecology.com/post/introduction-to-missing-data-in-r/</guid><description>&lt;p>As many of us know, science is not a perfect process. Maybe you can&amp;rsquo;t get out in the field on a certain day. Maybe you can only sample a portion of what needs to get done. Or maybe you&amp;rsquo;re downloading public data sets and they aren&amp;rsquo;t lining up perfectly. All of these can result in missing data, which can be a real pain when it comes time for analysis.&lt;/p>
&lt;p>Another common source of missing data, especially when recording species abundance data in community ecology, is when you forget to write a &amp;lsquo;0&amp;rsquo; and instead leave the entry blank. In the moment you might know that blank entries mean zero, but give it just a few weeks and you&amp;rsquo;ll be scratching your head! In those cases it&amp;rsquo;s often best to label those entries as unknown or missing.&lt;/p>
&lt;p>In this tutorial, I&amp;rsquo;m going to explain what exactly an &lt;code>NA&lt;/code> value is, how you can find &lt;code>NA&lt;/code>s in your data, and how you can remove them.&lt;/p>
&lt;img src="https://www.rforecology.com/nas_image1.png" alt="Image of person looking stressed by NA values coming from laptop" style="width:300px;"/>
&lt;h3 id="what-does-it-mean-to-have-nas-in-my-data">What does it mean to have NAs in my data?&lt;/h3>
&lt;p>&lt;code>NA&lt;/code>s represent missing values in R. This is pretty common if you&amp;rsquo;re importing data from Excel and have some empty cells in the spreadsheet. When you load the data into R, the empty cells will be populated with &lt;code>NA&lt;/code>s.&lt;/p>
&lt;!-- I'm being a bit redundant here, but I think that helps: -->
&lt;div class="alert alert-note">
&lt;div>
Note: missing data points, or those where you don&amp;rsquo;t actually know what the true value should be, are marked as &lt;code>NA&lt;/code> (which stands for &amp;lsquo;Not Available&amp;rsquo;) in R. In fact, you&amp;rsquo;ll notice the color change when you type &lt;code>NA&lt;/code> in your code since R already knows what that means.
&lt;/div>
&lt;/div>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Read in an example data set with NAs&lt;/span>
ex &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">read.csv&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;example_data.csv&amp;#34;&lt;/span>)
&lt;span style="color:#586e75"># View data&lt;/span>
ex
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## example data set
## 1 1 2 4
## 2 NA 2 4
## 3 16 1 4
## 4 2 NA 5
## 5 3 1 NA
## 6 6 7 8
&lt;/code>&lt;/pre>&lt;p>&lt;em>&lt;a href="https://www.rforecology.com/data/example_data.csv">&lt;strong>Click here to download the &lt;code>example_data.csv&lt;/code> file&lt;/strong> &lt;/a>if you want to follow along.&lt;/em>&lt;/p>
&lt;p>&lt;code>NA&lt;/code>s cannot be treated like other types of data (e.g, strings, numeric values). For example, you can&amp;rsquo;t perform math with them or use them in logical comparisons. If you do so, all you&amp;rsquo;ll get is an &lt;code>NA&lt;/code>. In the following examples, all positions in the vector with &lt;code>NA&lt;/code> just return &lt;code>NA&lt;/code> again, no matter what operation is performed. We also get &lt;code>NA&lt;/code> if we use mathematical functions such as &lt;code>sum()&lt;/code> on the vector, because R can&amp;rsquo;t add &lt;code>NA&lt;/code>s.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a vector with NAs&lt;/span>
v &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1.2&lt;/span>, &lt;span style="color:#2aa198">4.5&lt;/span>, &lt;span style="color:#cb4b16">NA&lt;/span>, &lt;span style="color:#2aa198">8.9&lt;/span>, &lt;span style="color:#cb4b16">NA&lt;/span>)
&lt;span style="color:#586e75"># Can we do math with NAs?&lt;/span>
v &lt;span style="color:#719e07">+&lt;/span> &lt;span style="color:#2aa198">1&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 2.2 5.5 NA 9.9 NA
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">sum&lt;/span>(v)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] NA
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Can we perform logical comparisons?&lt;/span>
v &lt;span style="color:#719e07">&amp;lt;&lt;/span> &lt;span style="color:#2aa198">7&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] TRUE TRUE NA FALSE NA
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">v &lt;span style="color:#719e07">==&lt;/span> &lt;span style="color:#2aa198">4.5&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] FALSE TRUE NA FALSE NA
&lt;/code>&lt;/pre>&lt;p>And the reason of course is simple&amp;hellip; What&amp;rsquo;s the answer to &lt;code>5 + 'some unknown number'&lt;/code> ?&lt;/p>
&lt;p>Have you figured it out yet?&lt;/p>
&lt;p>The answer is &lt;code>'some unknown number'&lt;/code>! 😄&lt;/p>
&lt;p>Thus: &lt;code>5 + NA = NA&lt;/code>&lt;/p>
&lt;h3 id="how-can-i-detect-nas-in-my-data">How can I detect NAs in my data?&lt;/h3>
&lt;p>So how can we see if we have &lt;code>NA&lt;/code>s in our data? We normally use &lt;code>==&lt;/code> to see if a value is equal to another one. Let&amp;rsquo;s see if that will work on our vector. We know that there&amp;rsquo;s an &lt;code>NA&lt;/code> in the 3rd position of our vector.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a vector with NAs&lt;/span>
v &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1.2&lt;/span>, &lt;span style="color:#2aa198">4.5&lt;/span>, &lt;span style="color:#cb4b16">NA&lt;/span>, &lt;span style="color:#2aa198">8.9&lt;/span>, &lt;span style="color:#cb4b16">NA&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>So theoretically, &lt;code>v == NA&lt;/code> should return &lt;code>FALSE FALSE TRUE FALSE TRUE&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Are there any NAs in our vector?&lt;/span>
v &lt;span style="color:#719e07">==&lt;/span> &lt;span style="color:#cb4b16">NA&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] NA NA NA NA NA
&lt;/code>&lt;/pre>&lt;p>But this code just gives us &lt;code>NA&lt;/code>s. Unfortunately, &lt;code>NA&lt;/code>s don&amp;rsquo;t work with any kind of logical operator either.&lt;/p>
&lt;p>Same as with math operations, &lt;code>NA&lt;/code> is just a placeholder for &lt;code>'I don't know the real value'&lt;/code>, so asking does &lt;code>NA == NA&lt;/code>, is the same as saying does &lt;code>'some unknown number' == 'some unknown number'&lt;/code>, which clearly has no known answer.&lt;/p>
&lt;p>Luckily, R gives us a special function to detect &lt;code>NA&lt;/code>s. This is the &lt;code>is.na()&lt;/code> function. And actually, if you try to type &lt;code>my_vector == NA&lt;/code>, R will tell you to use &lt;code>is.na()&lt;/code> instead.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/nas_image2.png" alt="Image showing warning sign from R when typing vector equals sign equals sign NA">&lt;/p>
&lt;p>&lt;code>is.na()&lt;/code> will work on individual values, vectors, lists, and data frames. It will return &lt;code>TRUE&lt;/code> or &lt;code>FALSE&lt;/code> where you have an &lt;code>NA&lt;/code> or where you don&amp;rsquo;t.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Which values in my vector are NA?&lt;/span>
&lt;span style="color:#268bd2">is.na&lt;/span>(v)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] FALSE FALSE TRUE FALSE TRUE
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Which values in my data frame are NA?&lt;/span>
&lt;span style="color:#268bd2">is.na&lt;/span>(ex)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## example data set
## [1,] FALSE FALSE FALSE
## [2,] TRUE FALSE FALSE
## [3,] FALSE FALSE FALSE
## [4,] FALSE TRUE FALSE
## [5,] FALSE FALSE TRUE
## [6,] FALSE FALSE FALSE
&lt;/code>&lt;/pre>&lt;p>You can also combine &lt;code>is.na()&lt;/code> with &lt;code>sum()&lt;/code> and &lt;code>which()&lt;/code> to figure out how many &lt;code>NA&lt;/code>s you have and where they&amp;rsquo;re located.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># How many NAs in my data frame?&lt;/span>
&lt;span style="color:#268bd2">sum&lt;/span>(&lt;span style="color:#268bd2">is.na&lt;/span>(ex))
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 3
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Which row contains an NA in the &amp;#39;data&amp;#39; column?&lt;/span>
&lt;span style="color:#268bd2">which&lt;/span>(&lt;span style="color:#268bd2">is.na&lt;/span>(ex&lt;span style="color:#719e07">$&lt;/span>data))
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 4
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Which vector positions contain NAs?&lt;/span>
&lt;span style="color:#268bd2">which&lt;/span>(&lt;span style="color:#268bd2">is.na&lt;/span>(v))
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 3 5
&lt;/code>&lt;/pre>&lt;div class="alert alert-note">
&lt;div>
Note: the reason &lt;code>sum(is.na(ex))&lt;/code> works is because &lt;code>is.na()&lt;/code> first converts your values to &lt;code>TRUE&lt;/code> or &lt;code>FALSE&lt;/code>, and applying math operations to T/F values automatically converts them to 1s or 0s.
&lt;/div>
&lt;/div>
&lt;h3 id="how-do-i-remove-nas-from-my-data">How do I remove NAs from my data?&lt;/h3>
&lt;p>Now that we know we have &lt;code>NA&lt;/code>s in our data&amp;hellip; how do we get rid of them?&lt;/p>
&lt;p>Some functions have an easy built-in argument, &lt;code>na.rm&lt;/code>, which you can set to &lt;code>TRUE&lt;/code> or &lt;code>FALSE&lt;/code> to remove &lt;code>NA&lt;/code>s from the data to be evaluated. If you remember the example from earlier, just running &lt;code>sum(v)&lt;/code> returned &lt;code>NA&lt;/code>. Adding &lt;code>na.rm&lt;/code> fixes this:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Sum across vector v&lt;/span>
&lt;span style="color:#268bd2">sum&lt;/span>(v, na.rm &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#cb4b16">TRUE&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 14.6
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Take the mean of our vector v&lt;/span>
&lt;span style="color:#268bd2">mean&lt;/span>(v, na.rm &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#cb4b16">TRUE&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 4.866667
&lt;/code>&lt;/pre>&lt;div class="alert alert-note">
&lt;div>
Note that the decision to get rid of or replace missing values rather than leaving them in as-is, is both a technical and philosophical topic of conversation and should be addressed on a case-by-case basis. There are statistical methods for replacing missing values without biasing the outcome of analyses (e.g., in multivariate ordination analyses). Many statistical tests in R will automatically remove &lt;code>NA&lt;/code> values, but in other cases it makes more sense to remove them manually. Either way, this goes beyond the current scope of this post, but it is an important note to keep in mind.
&lt;/div>
&lt;/div>
&lt;p>If you want to remove all observations containing &lt;code>NA&lt;/code>s, you can also use the &lt;code>na.omit()&lt;/code> function. Keep in mind that removing an observation means removing the entire row of data.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># remove NAs from our data frame&lt;/span>
&lt;span style="color:#268bd2">na.omit&lt;/span>(ex)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## example data set
## 1 1 2 4
## 3 16 1 4
## 6 6 7 8
&lt;/code>&lt;/pre>&lt;p>Something else you might want to do is replace those &lt;code>NA&lt;/code>s with another value. Maybe you want to replace missing values with 0 (You&amp;rsquo;re 200% sure those missing values were supposed to be 0s?? 😄), or maybe you want to replace those missing values with the mean of your data to approximate what those values would be (that can be especially useful for multivariate analyses). You can subset your vector or data frame to the places where &lt;code>is.na()&lt;/code> is true, and set those equal to a new value.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Replace NAs in data frame with 0&lt;/span>
ex&lt;span style="color:#268bd2">[is.na&lt;/span>(ex)] &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#2aa198">0&lt;/span>
&lt;span style="color:#586e75"># View data frame&lt;/span>
ex
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## example data set
## 1 1 2 4
## 2 0 2 4
## 3 16 1 4
## 4 2 0 5
## 5 3 1 0
## 6 6 7 8
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Replace NAs in vector with the mean&lt;/span>
v&lt;span style="color:#268bd2">[is.na&lt;/span>(v)] &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">mean&lt;/span>(v, na.rm &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#cb4b16">TRUE&lt;/span>)
&lt;span style="color:#586e75"># View vector&lt;/span>
v
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 1.200000 4.500000 4.866667 8.900000 4.866667
&lt;/code>&lt;/pre>&lt;p>Awesome! Now you know how to find &lt;code>NA&lt;/code>s in your data, perform functions without letting &lt;code>NA&lt;/code>s get in the way, and remove &lt;code>NA&lt;/code>s from your data for further analysis. Soon these functions will come to you &lt;code>NA&lt;/code>turally&amp;hellip;haha. I hope you found this tutorial helpful. Happy coding!&lt;/p>
&lt;p>P.S. I&amp;rsquo;d recommend listening to &lt;a href="https://youtu.be/IoyvvEWHodk" target="_blank" rel="noopener">this song&lt;/a> to put you in the &lt;code>NA&lt;/code>-removing mood!&lt;/p>
&lt;!-- to make youtube embeds be mobile responsive:
https://howchoo.com/webdev/how-to-make-youtube-videos-responsive-without-js -->
&lt;div class="embed-youtube">
&lt;iframe width="560" height="315" src="https://www.youtube.com/embed/IoyvvEWHodk" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen>&lt;/iframe>
&lt;/div>
&lt;br>
&lt;hr>
&lt;center>If you liked this post and want to learn more, then check out our online course on the complete basics of R for ecology:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://coaching.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>How to join tables in R</title><link>https://www.rforecology.com/post/how-to-join-tables-in-r/</link><pubDate>Wed, 26 Jan 2022 11:45:39 -0500</pubDate><guid>https://www.rforecology.com/post/how-to-join-tables-in-r/</guid><description>&lt;p>In this blog post, I&amp;rsquo;m going to talk about joining data tables together. Joining tables is incredibly useful when you have to download several data files on a common set of subjects and then aggregate them into a larger, singular data set.&lt;/p>
&lt;p>This is pretty common with spatial data. For example, you might have one table that contains geographic information on parcels of land like census tracts, each with their own ID. You can then find separate demographic or economic data tables online that can link up with the geographic data using the census tract ID.&lt;/p>
&lt;p>Another common example is if you collected community survey data from plots, but then also have associated environmental data collected from those same plots saved as a different spreadsheet of data.&lt;/p>
&lt;p>These kinds of situations would call for you to &lt;em>merge&lt;/em>, or &lt;em>join&lt;/em>, your two data tables together. In this tutorial, I&amp;rsquo;m going to introduce you to different types of joins, and I&amp;rsquo;ll show you how to perform joins both in base R and using the &lt;code>dplyr&lt;/code> package.&lt;/p>
&lt;img src="https://www.rforecology.com/joins_image0.png" alt="Image showing tables joining together on top of a venn diagram" style="width:400px;"/>
&lt;h2 id="joining-data-in-base-r">Joining data in base R&lt;/h2>
&lt;p>We&amp;rsquo;re going to start with a basic data set. These data contain 6 different students and the distance of their morning commute to school, in miles.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create a data frame with information on where students live&lt;/span>
&lt;span style="color:#268bd2">set.seed&lt;/span>(&lt;span style="color:#2aa198">123&lt;/span>)
student_residence &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">data.frame&lt;/span>(student &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">seq&lt;/span>(&lt;span style="color:#2aa198">1&lt;/span>, &lt;span style="color:#2aa198">6&lt;/span>),
distance &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">runif&lt;/span>(&lt;span style="color:#2aa198">6&lt;/span>, &lt;span style="color:#2aa198">3&lt;/span>, &lt;span style="color:#2aa198">10&lt;/span>))
&lt;span style="color:#586e75"># Look at the data&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(student_residence)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## student distance
## 1 1 5.013043
## 2 2 8.518136
## 3 3 5.862838
## 4 4 9.181122
## 5 5 9.583271
## 6 6 3.318895
&lt;/code>&lt;/pre>&lt;p>The &lt;code>runif()&lt;/code> function creates a random assortment of numbers between a minimum and maximum value that you specify. I asked &lt;code>runif()&lt;/code> to generate 6 random numbers between 3 and 10. The &lt;code>set.seed()&lt;/code> function just makes it so that each time you run this code, the random output will always be the same (when using the same seed number). Use &lt;code>set.seed(123)&lt;/code> if you&amp;rsquo;d like to follow along with the same numbers I have here.&lt;/p>
&lt;p>Students at this school were also surveyed to find out what method of transportation they use to get to school in the morning. This survey was offered to several students, but not everyone responded (looks like only students 1, 3, 5, and 7 responded). Note that in this scenario we somehow don&amp;rsquo;t have data on commute distance for student 7.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Create another data frame with information on how students get to school&lt;/span>
student_transport &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">data.frame&lt;/span>(student &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">seq&lt;/span>(&lt;span style="color:#2aa198">1&lt;/span>, &lt;span style="color:#2aa198">7&lt;/span>, by &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">2&lt;/span>),
transport &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;Bus&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Carpool&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Walk&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Bus&amp;#34;&lt;/span>))
&lt;span style="color:#586e75"># Look at the data&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(student_transport)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## student transport
## 1 1 Bus
## 2 3 Carpool
## 3 5 Walk
## 4 7 Bus
&lt;/code>&lt;/pre>&lt;p>Let&amp;rsquo;s say we want to look at both student transportation methods and morning commute distance so we can create a better bus schedule. It&amp;rsquo;s tough to do that when transportation method and commute distance are in different data sets, so we want to join them together.&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
&lt;strong>Note: I&amp;rsquo;m using the term &amp;lsquo;table&amp;rsquo; and &amp;lsquo;data frame&amp;rsquo; interchangeably here.&lt;/strong>
&lt;/div>
&lt;/div>
&lt;p>Importantly, to join two different tables together, you need to make sure you have a column in common between both data sets. This common column is called a &amp;ldquo;key&amp;rdquo;, and it should provide a unique identifier for every row. In the case of our data, the &amp;ldquo;student&amp;rdquo; column is our key, and it provides a unique number for each student.&lt;/p>
&lt;p>To join our data, we can use the &lt;code>merge()&lt;/code> function in base R. &lt;code>merge()&lt;/code> will first accept two data frames as arguments, and then the name of the column that the two data frames have in common, like so: &lt;code>merge(x = dataframe1, y = dataframe2, by = &amp;quot;column name&amp;quot;)&lt;/code>. With our data, this would look like:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Merge data frames together&lt;/span>
students &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">merge&lt;/span>(x &lt;span style="color:#719e07">=&lt;/span> student_residence, y &lt;span style="color:#719e07">=&lt;/span> student_transport, by &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;student&amp;#34;&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>If we compare the values for student 1 in the new and old data sets, the values are the same. Great! Looks like the merge worked.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Compare the data to see if the merge worked&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(students)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## student distance transport
## 1 1 5.013043 Bus
## 2 3 5.862838 Carpool
## 3 5 9.583271 Walk
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">student_residence[1, &lt;span style="color:#2aa198">2&lt;/span>]
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 5.013043
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">student_transport[1, &lt;span style="color:#2aa198">2&lt;/span>]
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] &amp;quot;Bus&amp;quot;
&lt;/code>&lt;/pre>&lt;p>But what if the common columns that we want to merge by don&amp;rsquo;t have the same name? Let&amp;rsquo;s change the name of the &amp;ldquo;student&amp;rdquo; column in &lt;code>student_transport&lt;/code> to &amp;ldquo;studentID&amp;rdquo; instead.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View data&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(student_transport)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## studentID transport
## 1 1 Bus
## 2 3 Carpool
## 3 5 Walk
## 4 7 Bus
&lt;/code>&lt;/pre>&lt;p>If this is the case, we can still use the &lt;code>merge()&lt;/code> function with the names of two data frames, but instead of using one &amp;ldquo;by&amp;rdquo; argument, we&amp;rsquo;re going to use two, the &lt;code>by.x()&lt;/code> and &lt;code>by.y()&lt;/code> arguments, like so: &lt;code>merge(x = dataframe1, y = dataframe2, by.x = &amp;quot;dataframe1 column&amp;quot;, by.y = &amp;quot;dataframe2 column&amp;quot;)&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Try the merge again&lt;/span>
students2 &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">merge&lt;/span>(x &lt;span style="color:#719e07">=&lt;/span> student_residence, y &lt;span style="color:#719e07">=&lt;/span> student_transport, by.x &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;student&amp;#34;&lt;/span>, by.y &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;studentID&amp;#34;&lt;/span>)
&lt;span style="color:#586e75"># Compare this new data set to the old one&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(students2)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## student distance transport
## 1 1 5.013043 Bus
## 2 3 5.862838 Carpool
## 3 5 9.583271 Walk
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">head&lt;/span>(students)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## student distance transport
## 1 1 5.013043 Bus
## 2 3 5.862838 Carpool
## 3 5 9.583271 Walk
&lt;/code>&lt;/pre>&lt;p>The data sets look the same, so we know both methods worked.&lt;/p>
&lt;h2 id="types-of-joins">Types of Joins&lt;/h2>
&lt;h3 id="inner-join">Inner join&lt;/h3>
&lt;p>You probably noticed that in the join we just performed, there were only three rows in the joined table. That&amp;rsquo;s because we performed something called an &amp;ldquo;inner join&amp;rdquo;, where R only returns the data frame rows that match up with the other data frame. If you were to visualize this type of join, it would look something like this:&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/joins_image1.png" alt="Image demonstrating what an inner join looks like as the intersection between two data frames">&lt;/p>
&lt;h3 id="left-join">Left join&lt;/h3>
&lt;p>There are also &amp;ldquo;left&amp;rdquo; joins and &amp;ldquo;right&amp;rdquo; joins. A left join returns all rows from the left data frame and any matching rows from the right data frame. In the &lt;code>merge()&lt;/code> function, the &amp;ldquo;left&amp;rdquo; data frame is the x data frame, or the one you name first. The &amp;ldquo;right&amp;rdquo; data frame is the y data frame, or the one you list second. We can tell &lt;code>merge()&lt;/code> that we want to keep all rows from the &amp;ldquo;left&amp;rdquo; data frame by adding the argument &lt;code>all.x = TRUE&lt;/code>. If we&amp;rsquo;re more interested in where students live, we&amp;rsquo;ll want to keep all the rows from &lt;code>student_residence&lt;/code>. Let&amp;rsquo;s go ahead and do that:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Perform a left join&lt;/span>
&lt;span style="color:#268bd2">merge&lt;/span>(x &lt;span style="color:#719e07">=&lt;/span> student_residence, y &lt;span style="color:#719e07">=&lt;/span> student_transport, by &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;student&amp;#34;&lt;/span>, all.x &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">T&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## student distance transport
## 1 1 5.013043 Bus
## 2 2 8.518136 &amp;lt;NA&amp;gt;
## 3 3 5.862838 Carpool
## 4 4 9.181122 &amp;lt;NA&amp;gt;
## 5 5 9.583271 Walk
## 6 6 3.318895 &amp;lt;NA&amp;gt;
&lt;/code>&lt;/pre>&lt;p>We can see that indeed, all the rows from &lt;code>student_residence&lt;/code> have been kept. Since &lt;code>student_transport&lt;/code> was missing some of the student records, there are NAs in the table where the join operation couldn&amp;rsquo;t find a match for the student. The image below visualizes what a left join would look like.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/joins_image2.png" alt="Image demonstrating what a left join looks like as both the left side of a venn diagram and the intersection.">&lt;/p>
&lt;h3 id="right-join">Right join&lt;/h3>
&lt;p>A right join does the same thing as a left join, just swapping the arguments. Instead of specifying &lt;code>all.x&lt;/code>, we&amp;rsquo;ll use the argument &lt;code>all.y = TRUE&lt;/code>. If we&amp;rsquo;re more interested in student transportation methods, we&amp;rsquo;ll want to keep all the rows from &lt;code>student_transport&lt;/code>.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Perform a right join&lt;/span>
&lt;span style="color:#268bd2">merge&lt;/span>(x &lt;span style="color:#719e07">=&lt;/span> student_residence, y &lt;span style="color:#719e07">=&lt;/span> student_transport, by &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;student&amp;#34;&lt;/span>, all.y &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">T&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## student distance transport
## 1 1 5.013043 Bus
## 2 3 5.862838 Carpool
## 3 5 9.583271 Walk
## 4 7 NA Bus
&lt;/code>&lt;/pre>&lt;p>Now, we have all the rows from &lt;code>student_transport&lt;/code>. Again, there&amp;rsquo;s an NA where the join operation couldn&amp;rsquo;t find a match for the student in the other data frame. The image below visualizes what a right join does.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/joins_image3.png" alt="Image demonstrating what a right join looks like as both the right side of a venn diagram and the intersection.">&lt;/p>
&lt;h3 id="full-join">Full join&lt;/h3>
&lt;p>The last type of join is called a &amp;ldquo;full join&amp;rdquo; (or &amp;ldquo;outer join&amp;rdquo;) which includes &lt;em>all&lt;/em> the rows from both data frames, whether or not they match with one another. We can specify this by including both the &lt;code>all.x&lt;/code> and &lt;code>all.y&lt;/code> arguments.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Perform a full join&lt;/span>
&lt;span style="color:#268bd2">merge&lt;/span>(x &lt;span style="color:#719e07">=&lt;/span> student_residence, y &lt;span style="color:#719e07">=&lt;/span> student_transport, by &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;student&amp;#34;&lt;/span>, all.x &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">T&lt;/span>, all.y &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">T&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## student distance transport
## 1 1 5.013043 Bus
## 2 2 8.518136 &amp;lt;NA&amp;gt;
## 3 3 5.862838 Carpool
## 4 4 9.181122 &amp;lt;NA&amp;gt;
## 5 5 9.583271 Walk
## 6 6 3.318895 &amp;lt;NA&amp;gt;
## 7 7 NA Bus
&lt;/code>&lt;/pre>&lt;p>&lt;img src="https://www.rforecology.com/joins_image4.png" alt="Image demonstrating what a full join looks like, with all rows included.">&lt;/p>
&lt;h2 id="joining-data-using-the-dplyr-package">Joining data using the dplyr package&lt;/h2>
&lt;p>I just demonstrated how to join tables in base R, but many of you are probably also familiar with the &lt;code>dplyr&lt;/code> package. &lt;code>dplyr&lt;/code> provides a convenient way to perform the different types of joins using the functions &lt;code>inner_join()&lt;/code>, &lt;code>left_join()&lt;/code>, &lt;code>right_join()&lt;/code>, and &lt;code>full_join()&lt;/code>. All of these functions accept the forms &lt;code>XXX_join(dataframe1, dataframe2, by = &amp;quot;column name&amp;quot;)&lt;/code>, and you don&amp;rsquo;t need to add anything else like &lt;code>all.x&lt;/code> or &lt;code>all.y&lt;/code> because the specific type of join is already built into the specific function. I&amp;rsquo;ll quickly demonstrate how to use these functions below:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Load package&lt;/span>
&lt;span style="color:#268bd2">library&lt;/span>(dplyr)
&lt;/code>&lt;/pre>&lt;/div>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Inner join&lt;/span>
&lt;span style="color:#268bd2">inner_join&lt;/span>(student_residence, student_transport, by &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;student&amp;#34;&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## student distance transport
## 1 1 5.013043 Bus
## 2 3 5.862838 Carpool
## 3 5 9.583271 Walk
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Left join&lt;/span>
&lt;span style="color:#268bd2">left_join&lt;/span>(student_residence, student_transport, by &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;student&amp;#34;&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## student distance transport
## 1 1 5.013043 Bus
## 2 2 8.518136 &amp;lt;NA&amp;gt;
## 3 3 5.862838 Carpool
## 4 4 9.181122 &amp;lt;NA&amp;gt;
## 5 5 9.583271 Walk
## 6 6 3.318895 &amp;lt;NA&amp;gt;
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Right join&lt;/span>
&lt;span style="color:#268bd2">right_join&lt;/span>(student_residence, student_transport, by &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;student&amp;#34;&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## student distance transport
## 1 1 5.013043 Bus
## 2 3 5.862838 Carpool
## 3 5 9.583271 Walk
## 4 7 NA Bus
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Full join&lt;/span>
&lt;span style="color:#268bd2">full_join&lt;/span>(student_residence, student_transport, by &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;student&amp;#34;&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## student distance transport
## 1 1 5.013043 Bus
## 2 2 8.518136 &amp;lt;NA&amp;gt;
## 3 3 5.862838 Carpool
## 4 4 9.181122 &amp;lt;NA&amp;gt;
## 5 5 9.583271 Walk
## 6 6 3.318895 &amp;lt;NA&amp;gt;
## 7 7 NA Bus
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Inner join but if your data frames have different column names&lt;/span>
&lt;span style="color:#268bd2">colnames&lt;/span>(student_transport)[1] &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#2aa198">&amp;#34;studentID&amp;#34;&lt;/span>
&lt;span style="color:#268bd2">inner_join&lt;/span>(student_residence, student_transport, by &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;student&amp;#34;&lt;/span> &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">&amp;#34;studentID&amp;#34;&lt;/span>))
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## student distance transport
## 1 1 5.013043 Bus
## 2 3 5.862838 Carpool
## 3 5 9.583271 Walk
&lt;/code>&lt;/pre>&lt;p>These joins should look the same as the ones demonstrated above using the &lt;code>merge()&lt;/code> function.
And now you know how to perform several types of join operations depending on which rows you need to retain!&lt;/p>
&lt;p>I hope this tutorial was helpful! Let us know what other tutorials you&amp;rsquo;d like to see in the comments below. 👇&lt;/p>
&lt;br>
&lt;hr>
&lt;center>If you liked this post and want to learn more, then check out our online course on the complete basics of R for ecology:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://coaching.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>The Basics of R (in Spanish!)</title><link>https://www.rforecology.com/post/the-basics-of-r-in-spanish/</link><pubDate>Wed, 19 Jan 2022 09:45:39 -0500</pubDate><guid>https://www.rforecology.com/post/the-basics-of-r-in-spanish/</guid><description>&lt;p>&lt;em>&lt;strong>(¿Quieres más detalles sobre el curso en español? Desplázate hacia abajo.)&lt;/strong>&lt;/em>&lt;/p>
&lt;p>Hello everyone! This blog post is a bit different from usual posts in that I&amp;rsquo;d like to make a very exciting announcement about an upcoming course launch.&lt;/p>
&lt;p>Part of my vision with R for Ecology is to make
&lt;i class="fab fa-r-project pr-1 fa-fw">&lt;/i> as accessible as possible to as many people as possible—especially ecologists and other scientists. Understanding how to work with, organize, visualize, and analyze data is &lt;em>essential&lt;/em> for doing good science. Either way, I&amp;rsquo;m very fortunate to have partnered with a fantastic biologist and ecologist from Argentina named &lt;a href="https://www.researchgate.net/profile/Joaquin-Cochero" target="_blank" rel="noopener">Joaquin Cochero&lt;/a> who has done an outstanding job translating my entire &lt;a href="https://www.rforecology.com" target="_blank" rel="noopener">Basics of R (for ecologists)&lt;/a> course into Spanish!&lt;/p>
&lt;figure>
&lt;img style='vertical-align:left;' src='https://www.rforecology.com/joaquin_cochero.png/' width=40%>
&lt;figcaption>Dr. Joaquin Cochero (Spanish-speaking R guru)&lt;/figcaption>
&lt;/figure>
&lt;p>Having worked in Galapagos, Ecuador for the last four years, I&amp;rsquo;ve made many Spanish-speaking friends and colleagues that have long requested that I make my course available in their native tongue. I&amp;rsquo;m very excited to say that it&amp;rsquo;s finally almost here.&lt;/p>
&lt;p>Without any further ado and for those that are not familiar with my original course in English, here is some more information about the course (in Spanish, of course 😄). And if you are interested in enrolling, &lt;a href="https://rforecology.ac-page.com/lo-esencial-de-r" target="_blank" rel="noopener">there&amp;rsquo;s a link at the bottom of this post&lt;/a> to pre-register for when enrollment opens.&lt;/p>
&lt;br>
&lt;h1 id="lo-esencial-de-r-para-ecólogos">&lt;strong>Lo esencial de R (para ecólogos)&lt;/strong>&lt;/h1>
&lt;!-- to make youtube embeds be mobile responsive:
https://howchoo.com/webdev/how-to-make-youtube-videos-responsive-without-js -->
&lt;div class="embed-youtube">
&lt;iframe width="560" height="300" src="https://www.youtube.com/embed/sVC1DS9xyfs" title="YouTube video player" frameborder="0" allow="accelerometer; autoplay; clipboard-write; encrypted-media; gyroscope; picture-in-picture" allowfullscreen>&lt;/iframe>
&lt;/div>
&lt;p>Este es &lt;em>el curso&lt;/em> que me gustaría haber tenido cuando empecé como estudiante de posgrado. Uno puede pasar mucho tiempo diseñando experimentos y recogiendo datos y luego no tener ni idea de cómo explorar o visualizar esos datos.&lt;/p>
&lt;p>La curva de aprendizaje de R &lt;em>no tiene por qué&lt;/em> ser tan difícil y larga. En este curso, he seleccionado cuidadosamente los temas y funciones clave que te ayudarán a dominar los fundamentos y a superar rápidamente la curva con confianza, incluso si eres un completo principiante.&lt;/p>
&lt;p>En realidad, con sólo unas pocas funciones y métodos en R se puede hacer al menos el 80% de toda la manipulación y visualización de datos que necesitará hacer en ecología. Este curso se centra en esos conceptos clave para que su experiencia de aprendizaje sea lo más eficiente posible.&lt;/p>
&lt;h3 id="un-breve-resumen-del-plan-de-estudios">Un breve resumen del plan de estudios:&lt;/h3>
&lt;ol>
&lt;li>&lt;strong>Bienvenida al curso:&lt;/strong> Bienvenida y descarga del material del curso.&lt;/li>
&lt;li>&lt;strong>Introducción a R y R:&lt;/strong> Studio: Instalación, y lo más básico.&lt;/li>
&lt;li>&lt;strong>Vectores y marcos de datos:&lt;/strong> Todo lo esencial sobre cómo empezar a trabajar con números en R cuando están en forma de conjuntos de datos.&lt;/li>
&lt;li>&lt;strong>Carga de datos:&lt;/strong> Cómo cargar y acceder a tus propios datos en R y R Studio&lt;/li>
&lt;li>&lt;strong>Visualización básica de datos:&lt;/strong> Los tipos de gráficos más comunes y cómo crearlos para visualizar tus datos.&lt;/li>
&lt;li>&lt;strong>Manejo básico de datos:&lt;/strong> Aprenda todo lo esencial para organizar sus conjuntos de datos y prepararlos para su visualización o análisis, incluyendo cómo utilizar el paquete &amp;lsquo;dplyr&amp;rsquo; para una limpieza y organización de datos potente y eficiente.&lt;/li>
&lt;li>&lt;strong>Manejo avanzado de datos:&lt;/strong> Llene su cinturón de herramientas con herramientas y técnicas adicionales para hacer casi todo con sus datos, desde la unión de diferentes conjuntos de datos, hasta el tratamiento de los datos faltantes, y trabajar con formato de fechas.&lt;/li>
&lt;li>&lt;strong>Organización de proyectos:&lt;/strong> En esta sección final repaso cómo puedes organizar los proyectos que haces en R para conseguir un flujo de trabajo eficiente y potente.&lt;/li>
&lt;li>&lt;strong>Conclusión&lt;/strong>&lt;/li>
&lt;/ol>
&lt;h3 id="también-es-importante-saber-lo-que-este-curso-no-cubre">También es importante saber lo que este curso &lt;em>&lt;strong>NO cubre&lt;/strong>&lt;/em>:&lt;/h3>
&lt;p>Para cubrir los fundamentos de R de una manera efectiva, no puedo cubrir todo. Así que este curso &lt;strong>NO cubre&lt;/strong>:&lt;/p>
&lt;ul>
&lt;li>Análisis estadístics o modelización de datos&lt;/li>
&lt;li>SIG o visualización espacial&lt;/li>
&lt;li>Temas avanzados de visualización como el uso de &amp;lsquo;ggplot2&amp;rsquo;
Creo que estos temas son relativamente fáciles de profundizar una vez que se han establecido los fundamentos, y tengo previsto ampliar estos y otros temas en futuros cursos, y tratar de cubrir todo esto en tu primer curso no es necesario (y añadirá mucho estrés con la curva de aprendizaje).&lt;/li>
&lt;/ul>
&lt;h3 id="para-responder-a-algunas-preguntas-frecuentes">Para responder a algunas preguntas frecuentes:&lt;/h3>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>¿Cuándo empieza y termina el curso?&lt;/strong>
El curso es un curso en línea completamente a su ritmo, por lo que usted decide cuándo empieza y cuándo termina.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>¿Durante cuánto tiempo tendré acceso al curso?&lt;/strong>
¿Cómo es el acceso de por vida? Después de inscribirse, tendrá acceso ilimitado a este curso durante todo el tiempo que desee, en todos los dispositivos que posea.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>¿Y si no soy ecólogo? ¿Sigue siendo relevante el curso?&lt;/strong>
Sí. Aunque el curso se basa en mi propia experiencia en el uso de R para la ecología, todo el contenido del curso será aplicable y relevante para la mayoría de los otros campos de la biología, si no muchos campos incluso fuera de las ciencias. El curso también utiliza conjuntos de datos ecológicos, pero los principios son en su mayoría universales.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>¿Pero qué pasa si no sé nada sobre R o estadística?&lt;/strong>
No pasa nada! Este curso está diseñado como el primer paso para cualquier persona interesada en aprender a usar R y el contenido del curso no asume ningún requisito previo.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;br>
&lt;center>
&lt;h4 id="si-estás-interesado-en-lo-esencial-de-r-para-ecólogos-en-español-sólo-tienes-que-hacer-clic-abajo-para-preinscribirte-en-el-próximo-lanzamiento">&lt;strong>Si estás interesado en lo esencial de R (para ecólogos) (en español!), sólo tienes que hacer clic abajo para preinscribirte en el próximo lanzamiento!&lt;/strong>&lt;/h4>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://rforecology.ac-page.com/lo-esencial-de-r" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Página de preinscripción&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;/center>
&lt;br>
&lt;br>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>Making your first plot in R</title><link>https://www.rforecology.com/post/making-your-first-plot-in-r/</link><pubDate>Wed, 05 Jan 2022 09:45:39 -0500</pubDate><guid>https://www.rforecology.com/post/making-your-first-plot-in-r/</guid><description>&lt;p>With the new year, I&amp;rsquo;m hoping more of you take up learning R, so with that I want to share a tutorial from my course on an &lt;a href="https://courses.rforecology.com/p/intro-to-dataviz-for-ecologists-prereg" target="_blank" rel="noopener">introduction to data visualization with R&lt;/a> to help get you started.&lt;/p>
&lt;p>If you are completely new to R and don&amp;rsquo;t even know where to start, check out my last post on &lt;a href="https://www.rforecology.com/post/how-to-install-r-and-rstudio/" target="_blank" rel="noopener">installing R and RStudio here.&lt;/a>&lt;/p>
&lt;p>In this tutorial I&amp;rsquo;ll teach you how to create a scatterplot using the &lt;code>base&lt;/code> R package, which includes all the basic functions and is already installed in R (no need to use any additional packages).&lt;/p>
&lt;p>You can also follow along with this blog post in the video tutorial that is part of my course if you click on the thumbnail below:
&lt;a href="https://www.youtube.com/watch?v=EL05E_T5ajs" target="_blank" rel="noopener">&lt;img src="https://www.rforecology.com/firstplot_image1.png" alt="Video thumbnail for how to make your first plot">&lt;/a>&lt;/p>
&lt;p>To start with, we&amp;rsquo;re going to use some data that&amp;rsquo;s built into R using the &lt;code>data()&lt;/code> function to access it:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Load data&lt;/span>
&lt;span style="color:#268bd2">data&lt;/span>(PlantGrowth)
&lt;span style="color:#586e75"># Look at the beginning and ending 4 rows of data&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(PlantGrowth, &lt;span style="color:#2aa198">4&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## weight group
## 1 4.17 ctrl
## 2 5.58 ctrl
## 3 5.18 ctrl
## 4 6.11 ctrl
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">tail&lt;/span>(PlantGrowth, &lt;span style="color:#2aa198">4&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## weight group
## 27 4.92 trt2
## 28 6.15 trt2
## 29 5.80 trt2
## 30 5.26 trt2
&lt;/code>&lt;/pre>&lt;p>It looks like we have 30 rows of data and two columns. One column is called &amp;ldquo;weight&amp;rdquo;, which represents the dry biomass of each plant in grams. The other column is called &amp;ldquo;group&amp;rdquo;, and describes the experimental treatment that each plant is given.&lt;/p>
&lt;p>We can also see that there are ten plants in each treatment group. Note that I used &lt;code>$&lt;/code> after the name of the data set to refer to the &amp;lsquo;group&amp;rsquo; column in this case:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View number of rows per treatment group&lt;/span>
&lt;span style="color:#268bd2">table&lt;/span>(PlantGrowth&lt;span style="color:#719e07">$&lt;/span>group)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>##
## ctrl trt1 trt2
## 10 10 10
&lt;/code>&lt;/pre>&lt;p>Let&amp;rsquo;s add another column to this data set that describes the amount of water that each plant has received throughout its life (in liters). You can just copy and paste these numbers from the code here:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Add a new column&lt;/span>
PlantGrowth&lt;span style="color:#719e07">$&lt;/span>water &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">3.063&lt;/span>, &lt;span style="color:#2aa198">3.558&lt;/span>, &lt;span style="color:#2aa198">2.233&lt;/span>, &lt;span style="color:#2aa198">3.147&lt;/span>, &lt;span style="color:#2aa198">2.379&lt;/span>, &lt;span style="color:#2aa198">2.106&lt;/span>, &lt;span style="color:#2aa198">2.384&lt;/span>, &lt;span style="color:#2aa198">2.444&lt;/span>, &lt;span style="color:#2aa198">2.492&lt;/span>, &lt;span style="color:#2aa198">3.292&lt;/span>,
&lt;span style="color:#2aa198">2.732&lt;/span>, &lt;span style="color:#2aa198">2.153&lt;/span>, &lt;span style="color:#2aa198">2.660&lt;/span>, &lt;span style="color:#2aa198">1.938&lt;/span>, &lt;span style="color:#2aa198">3.583&lt;/span>, &lt;span style="color:#2aa198">1.817&lt;/span>, &lt;span style="color:#2aa198">3.494&lt;/span>, &lt;span style="color:#2aa198">2.559&lt;/span>, &lt;span style="color:#2aa198">1.530&lt;/span>, &lt;span style="color:#2aa198">2.372&lt;/span>,
&lt;span style="color:#2aa198">3.176&lt;/span>, &lt;span style="color:#2aa198">2.611&lt;/span>, &lt;span style="color:#2aa198">3.262&lt;/span>, &lt;span style="color:#2aa198">2.947&lt;/span>, &lt;span style="color:#2aa198">2.523&lt;/span>, &lt;span style="color:#2aa198">2.152&lt;/span>, &lt;span style="color:#2aa198">2.771&lt;/span>, &lt;span style="color:#2aa198">2.878&lt;/span>, &lt;span style="color:#2aa198">2.263&lt;/span>, &lt;span style="color:#2aa198">2.518&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>And now if we view our data, we can see that the new column was added.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># View first few rows of data &lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(PlantGrowth)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## weight group water
## 1 4.17 ctrl 3.063
## 2 5.58 ctrl 3.558
## 3 5.18 ctrl 2.233
## 4 6.11 ctrl 3.147
## 5 4.50 ctrl 2.379
## 6 4.61 ctrl 2.106
&lt;/code>&lt;/pre>&lt;p>For our first plot, let&amp;rsquo;s create a scatterplot to see how plant weight varies with the amount of water that the plant has received.&lt;/p>
&lt;p>To do this, we&amp;rsquo;re going to use the &lt;code>plot()&lt;/code> function, where you can assign variables to the X and Y axes. Since we want to see how weight varies as a function of water, we&amp;rsquo;ll put weight on the Y axis and water on the X axis. Remember that we use the dollar sign &lt;code>$&lt;/code> to reference a specific column in a data set.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Our first plot!&lt;/span>
&lt;span style="color:#268bd2">plot&lt;/span>(x &lt;span style="color:#719e07">=&lt;/span> PlantGrowth&lt;span style="color:#719e07">$&lt;/span>water, y &lt;span style="color:#719e07">=&lt;/span> PlantGrowth&lt;span style="color:#719e07">$&lt;/span>weight)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/making-your-first-plot-in-r/index_files/figure-html/unnamed-chunk-5-1.png" width="672" />&lt;/p>
&lt;p>And that&amp;rsquo;s our first plot! You can make the plot smaller or larger by just moving the plot viewing window around.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/Resizing_R_window.gif" alt="Gif showing someone moving the R plot viewing window around to change its proportions.">&lt;/p>
&lt;p>There&amp;rsquo;s also another way to use the &lt;code>plot()&lt;/code> function, and this method is generally considered the better practice (and will translate to other types of data visualization and analysis techniques).&lt;/p>
&lt;p>As we said before, we visualize relationships between the X and Y axes by viewing the Y variable &amp;ldquo;as a function of&amp;rdquo; X. If we&amp;rsquo;re talking in terms of experimental design, the Y axis is the dependent variable (the variable you measure), and the X axis is the independent variable (the variable you control or want to examine the effect of).&lt;/p>
&lt;p>The shorthand for &amp;ldquo;as a function of&amp;rdquo; is the &lt;code>~&lt;/code> symbol, or the tilde. The tilde can be found under the Escape key on a keyboard, and you usually have to hold Shift down to type it.&lt;/p>
&lt;p>So if we use this with the &lt;code>plot()&lt;/code> function, we would just write:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Plotting plant weight as a function of the amount of water it received&lt;/span>
&lt;span style="color:#268bd2">plot&lt;/span>(PlantGrowth&lt;span style="color:#719e07">$&lt;/span>weight &lt;span style="color:#719e07">~&lt;/span> PlantGrowth&lt;span style="color:#719e07">$&lt;/span>water)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/making-your-first-plot-in-r/index_files/figure-html/unnamed-chunk-6-1.png" width="672" />&lt;/p>
&lt;p>In plain English, we are plotting plant weight as a function of the amount of water it has received. This plot looks exactly the same as the plot that we made earlier, as it should.&lt;/p>
&lt;p>We can also make this code simpler by adding another argument to the function. If we specify the data that we want to use, we can just use the column names directly instead of typing out the whole phrase &lt;code>PlantGrowth$water&lt;/code>, like so:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">plot&lt;/span>(weight &lt;span style="color:#719e07">~&lt;/span> water, data &lt;span style="color:#719e07">=&lt;/span> PlantGrowth)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/making-your-first-plot-in-r/index_files/figure-html/unnamed-chunk-7-1.png" width="672" />&lt;/p>
&lt;p>So now, the axis labels look much nicer because they just say &amp;ldquo;weight&amp;rdquo; and &amp;ldquo;water&amp;rdquo; instead of having &amp;ldquo;PlantGrowth$&amp;rdquo; in front of both words. Voila! we now we have a basic scatterplot.&lt;/p>
&lt;h3 id="in-summary-we-learned">In summary, we learned:&lt;/h3>
&lt;ol>
&lt;li>
&lt;p>How to load in built-in data as well as adding our own custom data as another column in the data set&lt;/p>
&lt;/li>
&lt;li>
&lt;p>How to plot a simple scatterplot in base R using the &lt;code>plot()&lt;/code> function&lt;/p>
&lt;/li>
&lt;li>
&lt;p>How to use a tilde in the &lt;code>plot()&lt;/code> function to make the code neater&lt;/p>
&lt;/li>
&lt;/ol>
&lt;p>Best of luck making your first plots using your own data! I hope this tutorial was helpful.&lt;/p>
&lt;br>
&lt;hr>
&lt;center>If you liked this post and want to learn more, then check out my online course on the Introduction to Data Visualization with R (for ecologists):
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://coaching.rforecology.com/intro-to-dataviz-in-r-for-ecologists-enroll?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start visualizing your data now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>How to install (and update!) R and RStudio</title><link>https://www.rforecology.com/post/how-to-install-r-and-rstudio/</link><pubDate>Sat, 01 Jan 2022 05:28:39 -0500</pubDate><guid>https://www.rforecology.com/post/how-to-install-r-and-rstudio/</guid><description>&lt;p>One of the first steps to learning R is to have it downloaded and installed on your computer. In this post I&amp;rsquo;ll show you how to do that and how to download and install RStudio—a key tool for using R, and how I do all my work and tutorials.&lt;/p>
&lt;p>If you want to follow along with a video tutorial, you can click on the image below where you can watch the first lesson in my full course on the &lt;a href="https://www.rforecology.com" target="_blank" rel="noopener">Basics of R (for ecologists)&lt;/a>.&lt;/p>
&lt;p>&lt;a href="https://www.youtube.com/watch?v=YKvkXKeGoa8" target="_blank" rel="noopener">&lt;img src="https://www.rforecology.com/installingr_image0.png" alt="Video thumbnail for how to install R and Rstudio">&lt;/a>&lt;/p>
&lt;p>For starters, R is a free open-source programming language used for organizing, analyzing, and visualizing data. Its versatility is highlighted by the large number of user-created packages that it comes with, which provide useful functions and guides that anyone can use (e.g., found on &lt;a href="https://cran.r-project.org/web/packages/available_packages_by_name.html" target="_blank" rel="noopener">CRAN&lt;/a>). So R is the programming language itself, and it comes with an environment or console that can read and execute your code. You &lt;em>could&lt;/em> code in R without using RStudio, as you can see in the image below. That&amp;rsquo;s what the plain R console looks like; I just loaded up some data, viewed the first few rows, and renamed the columns.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/installingr_image2.png" alt="Image showing the plain R console, with the trees dataset loaded up">&lt;/p>
&lt;p>By comparison, RStudio is a more versatile IDE, or Integrated Development Environment. Most people who use R also use RStudio because it provides a clean point-and-click dashboard of tools where you can type your code, view your figures, organize your data, variables, and files, as well as viewing the help window. In comparison to RStudio, the basic R IDE/console is extremely basic and doesn&amp;rsquo;t provide as many accessible tools as RStudio does.&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
Here I&amp;rsquo;ve set the editor color theme in RStudio to Solarized Dark, which is easier on the eyes when spending a lot of time coding in R. To change the theme, just go to RStudio &amp;ndash;&amp;gt; Preferences (on a Mac) or Tools &amp;ndash;&amp;gt; Options (on a Windows) and then click the Appearance tab where you can modify the Editor theme. &lt;a href="https://youtu.be/dYOs0Qn616s" target="_blank">Also check out this tutorial where I show you how to do that plus a few other useful tweaks for setting up RStudio.&lt;/a>
&lt;/div>
&lt;/div>
&lt;p>&lt;img src="https://www.rforecology.com/installingr_image3.png" alt="Image showing the RStudio console and its greater complexity and number of tools">&lt;/p>
&lt;h3 id="if-you-are-installing-r-and-rstudio-for-the-first-time">If you are installing R and RStudio for the first time:&lt;/h3>
&lt;p>To download R, go &lt;a href="https://cran.rstudio.com/" target="_blank" rel="noopener">here.&lt;/a> Choose the download link that corresponds to your computer. I have a Mac, so I clicked that link.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/installingr_image4.png" alt="Image showing download options for R.">&lt;/p>
&lt;p>You can download RStudio &lt;a href="https://www.rstudio.com/products/rstudio/download/" target="_blank" rel="noopener">here,&lt;/a> and you want to choose &amp;ldquo;RStudio Desktop&amp;rdquo;.&lt;/p>
&lt;p>The important thing when installing R and RStudio is that you need to install R &lt;em>before&lt;/em> you install RStudio. If you do it in the reverse order, you will likely run into errors. All you&amp;rsquo;ll need to do is open the files you downloaded for R and RStudio, and the installation process should begin on its own.&lt;/p>
&lt;p>For Mac users, there&amp;rsquo;s also something called XQuartz, which you might not need for basic coding in R, but which might be helpful down the line for running certain packages. You can download XQuartz &lt;a href="https://www.xquartz.org/" target="_blank" rel="noopener">here.&lt;/a> Similarly, if you just open the downloaded file, XQuartz should install on its own.&lt;/p>
&lt;h3 id="if-you-want-to-update-r-and-rstudio">If you want to update R and RStudio:&lt;/h3>
&lt;p>There are a few ways you can check your version of R and see whether or not it needs to be updated. One way is to run the actual R program. There, you can go to the &amp;ldquo;R&amp;rdquo; menu and click &amp;ldquo;Check for R Updates&amp;rdquo; (see image below). If you do that, R will tell you the current version you&amp;rsquo;re on, and whether or not there is a more updated version that you can download (circled in blue).&lt;/p>
&lt;p>Alternatively, if you&amp;rsquo;re in RStudio, you can type and run &amp;ldquo;sessionInfo()&amp;rdquo; in the R Console. The first line that the console returns is the version of R that you&amp;rsquo;re using. You can then download and install the latest version of R &lt;a href="https://cran.r-project.org/bin/macosx/" target="_blank" rel="noopener">here for Mac,&lt;/a> and &lt;a href="https://cran.r-project.org/bin/windows/base/" target="_blank" rel="noopener">here for Windows&lt;/a>.&lt;/p>
&lt;p>If you&amp;rsquo;re using a Windows computer, you may need to uninstall R to update it. You can find a quick guide for that &lt;a href="https://cran.r-project.org/bin/windows/base/rw-FAQ.html#How-do-I-UNinstall-R_003f" target="_blank" rel="noopener">here.&lt;/a> Another great option for Windows users is to use a package called installr (unfortunately only available for Windows, @Mac users). All you need to do is install &amp;ldquo;installr&amp;rdquo;, load up the library, and run the code &amp;ldquo;updateR()&amp;rdquo;. This function will check for newer versions and will guide you through the update process.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/installingr_image6.png" alt="Image showing cursor hovering over the R menu and Check for R Updates. The text also shows the version of R that the user is on, and it shows that they need to update their R software.">&lt;/p>
&lt;p>If you want to update to the latest version of RStudio, hover over &amp;ldquo;Help&amp;rdquo; on the top menu bar of your Mac, and click &amp;ldquo;Check for Updates&amp;rdquo;. Then, quit the RStudio program, go to the RStudio website, and download and install the latest version.
&lt;img src="https://www.rforecology.com/installingr_image5.png" alt="Image showing a cursor mousing over &amp;ldquo;Help&amp;rdquo; on the top menu bar and &amp;ldquo;Check for Updates&amp;rdquo;">
Now you should have the latest versions of R and RStudio on your computer. I hope this tutorial was helpful!&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
As a quick note: my &amp;ldquo;Basics of R&amp;rdquo; course uses R version 4.0.2 and RStudio version 1.3.959. There shouldn&amp;rsquo;t be any incompatibility issues if you&amp;rsquo;re running a slightly different version, but it is usually best to stay up to date with your software!
&lt;/div>
&lt;/div>
&lt;br>
&lt;hr>
&lt;center>If you liked this post and want to learn more, then check out my online course on the complete basics of R for ecology:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://coaching.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>Where to ask for help when coding in R</title><link>https://www.rforecology.com/post/where-to-ask-for-help-when-coding-in-r/</link><pubDate>Fri, 03 Dec 2021 09:08:00 -0600</pubDate><guid>https://www.rforecology.com/post/where-to-ask-for-help-when-coding-in-r/</guid><description>&lt;p>When learning R, it can be tough to figure out how to apply what you&amp;rsquo;ve learned to your own data. We often learn general skills that are helpful for manipulating our data, but things aren&amp;rsquo;t always so simple when it comes to your own analysis. Sometimes, we have very specific problems that we need to address but don&amp;rsquo;t know how.&lt;/p>
&lt;p>In this blog post, I&amp;rsquo;m going to describe a few R forums that are particularly useful when you need specific help with your own project.&lt;/p>
&lt;!-- I think this sentence needs a change because it's different than what the google search is. Also good to refer to the the example as "this" and use : if you are about to show it. -->
&lt;!-- Original sentence: As an example, let's say that we want to replace the second character of every word (string) in a vector.-->
&lt;p>As an example, let&amp;rsquo;s say that we want to replace a specific character of every word (string) in this vector:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">words &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;Apple&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Orange&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Banana&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Peach&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Nectarine&amp;#34;&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>I know that there must be a function that can address this, but I don&amp;rsquo;t know how to accomplish my particular need.&lt;/p>
&lt;p>No worries, we&amp;rsquo;ll just turn to Google real quick. How convenient—one of the first results is someone asking a similar question on StackOverflow. Even better, there are a whole bunch of related questions that are listed underneath the main result, in case any of those might also help me out.&lt;/p>
&lt;!-- for some reason, using knitr and pasting the images this way does not seem to work for the website when accessing the RSS feed. From now on when adding images, use this command, which also includes the descriptive text which is important for accessibility. You'll notice that I also changed the names of the images to include an indication of which blogpost this is for. This is because for the RSS feed to work, I need to put all the images (from all blogs) into the same folder. -->
&lt;p>&lt;img src="https://www.rforecology.com/troubleshootresources_Image_1.png" alt="Example google search of How to replace a character in a string in R with the second result circled with a red circle and arrow pointing to it.">&lt;/p>
&lt;!-- you can add linked images like this: -->
&lt;!-- [!["Video thumbnail of tutorial on exporting a dataframe from R into a table in MS Word"](troubleshootresources_Image 2.png)](https://youtu.be/_sb5uI8qTlk) -->
&lt;p>If we click on the link, we can see the specific question that the person asked.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/troubleshootresources_Image_2.png" alt="Example StackOverflow post showing someone who wants to know how to replace specific characters within strings in a vector.">&lt;/p>
&lt;p>And if we scroll down further, we can see the answers that people have provided. The really awesome part about StackOverflow and similar forums is that you can receive opinions from multiple people. There will always be multiple ways to solve a problem, and learning about the multiple ways can help you think more creatively when you code. People will often also comment on the answers themselves, generating discussion about why a certain method might be better than another, or how it can be improved.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/troubleshootresources_Image_3.png" alt="The first answer to the StackOverflow post, showing that it has 453 upvotes as well as other people commenting on the answer and generating further discussion.">&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/troubleshootresources_Image_4.png" alt="The second answer to the StackOverflow post, showing that it has fewer votes but provides another point of view on the topic.">&lt;/p>
&lt;p>These forums are great references because you&amp;rsquo;ll find that a lot of people have similar questions to you. But there are also situations where you&amp;rsquo;re analyzing your data and have a question that is VERY specific to your data or analysis. Times like this will call for you to make your own detailed post!&lt;/p>
&lt;p>I highlighted StackOverflow in this blog post, but there are a number of other sites that serve similar purposes.&lt;/p>
&lt;!-- good to use # for adding headings. one # is heading 1, ## is heading 2, etc. -->
&lt;h3 id="here-are-some-of-my-favorite-forum-resources">Here are some of my favorite forum resources:&lt;/h3>
&lt;h4 id="for-questions-related-specifically-to-coding-in-r">&amp;hellip;for questions related specifically to coding in R:&lt;/h4>
&lt;!-- Use the regular HTML format for adding links so that you can specify that you want the links to open in a new tab. This is important.-->
&lt;!-- Also, can use ** around something to make it bold. Good to use this for the links-->
&lt;ul>
&lt;li>
&lt;p>&lt;strong>&lt;a href="https://stackoverflow.com/questions/tagged/r" target="_blank">StackOverflow (questions related to R)&lt;/a>&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>&lt;a href = "https://www.facebook.com/groups/rstatistical/" target = "_blank">The Facebook group &amp;ldquo;R Statistical Software&amp;rdquo;&lt;/a>&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>&lt;a href = "https://rseek.org" target = "_blank">The website &amp;ldquo;Rseek.org&amp;rdquo;&lt;/a>&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>&lt;a href = "https://community.rstudio.com/" target = "_blank">The RStudio Community forum&lt;/a>&lt;/strong>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h4 id="for-questions-seeking-advice-on-statistics-or-research">&amp;hellip;for questions seeking advice on statistics or research:&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>&lt;a href = "https://explore.researchgate.net/display/support/Asking+questions" target = "_blank">ResearchGate&lt;/a>&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>&lt;a href = "https://stats.stackexchange.com/" target = "_blank">CrossValidated, the statistical side of StackOverflow&lt;/a>&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>&lt;a href = "https://gis.stackexchange.com/?fbclid=IwAR3-w59iuLi5qE7gUP4PZCG93JHVI5ReK3ujSushUhqtxqvsDXe60DYpiIo" target = "_blank">The GIS StackExchange, for advice on spatial analyses&lt;/a>&lt;/strong>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h4 id="for-any-type-of-question-related-to-r-and-data-analysis">&amp;hellip;for any type of question related to R and data analysis!&lt;/h4>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>&lt;a href = "https://www.facebook.com/groups/ecologyinr/" target = "_blank">The Facebook group &amp;ldquo;Ecology in R&amp;rdquo;&lt;/a>&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>&lt;a href = "https://www.facebook.com/groups/nbvRlanguage/" target = "_blank">The Facebook group &amp;ldquo;Statistics using R Programming&amp;rdquo;&lt;/a>&lt;/strong>&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>&lt;a href = "https://www.facebook.com/groups/rforecology" target = "_blank">Our very own Facebook group for students enrolled in our courses: &amp;ldquo;The Basics of R (for ecologists)&amp;quot;&lt;/a>&lt;/strong>&lt;/p>
&lt;/li>
&lt;/ul>
&lt;p>Now go forth and add all these resources to your R toolbelt! Feel free to leave any of your favorite resources in the comments below.&lt;/p>
&lt;br>
&lt;hr>
&lt;center>If you liked this post and want to learn more, then check out my online course on the complete basics of R for ecology:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://coaching.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href = "https://www.r-bloggers.com/" target = "_blank">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>How to go from R to nice tables in Microsoft Word</title><link>https://www.rforecology.com/post/exporting-tables-from-r-to-microsoft-word/</link><pubDate>Tue, 23 Nov 2021 12:28:39 -0600</pubDate><guid>https://www.rforecology.com/post/exporting-tables-from-r-to-microsoft-word/</guid><description>&lt;script src="https://rforecology.activehosted.com/f/embed.php?id=21" type="text/javascript" charset="utf-8">&lt;/script>
&lt;p>As scientists, we often have data or results in R that we want to export to Microsoft Word for the reports or publications that we’re writing.&lt;/p>
&lt;p>In this tutorial I show you how to do just that. You can also watch this tutorial as a video if you want to follow along while I code:&lt;/p>
&lt;p>&lt;a href="https://youtu.be/_sb5uI8qTlk" target="_blank" rel="noopener">&lt;img src="https://www.rforecology.com/tables_in_word_thumb.jpg" alt="&amp;ldquo;Video thumbnail of tutorial on exporting a dataframe from R into a table in MS Word&amp;rdquo;">&lt;/a>&lt;/p>
&lt;p>The first step is to load up some data. We’re going to use the &lt;code>Orange&lt;/code> dataset that comes built into R, which describes the growth of orange trees:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Load the data&lt;/span>
&lt;span style="color:#268bd2">data&lt;/span>(Orange)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>If we view the data, we can see the following columns: “Tree”, which contains an identifier for each tree that was measured; “age”, which contains the age (in days) of the tree at the time of measurement; and “circumference”, which is the circumference of the tree trunk, measured in millimeters.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">head&lt;/span>(Orange, &lt;span style="color:#2aa198">15&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## Tree age circumference
## 1 1 118 30
## 2 1 484 58
## 3 1 664 87
## 4 1 1004 115
## 5 1 1231 120
## 6 1 1372 142
## 7 1 1582 145
## 8 2 118 33
## 9 2 484 69
## 10 2 664 111
## 11 2 1004 156
## 12 2 1231 172
## 13 2 1372 203
## 14 2 1582 203
## 15 3 118 30
&lt;/code>&lt;/pre>&lt;p>So in this dataset, there are five different trees, each of which have been measured at the same time points (age).&lt;/p>
&lt;p>Let’s say we want to summarize this dataset to see how the different age groups compare in their growth. In the script below I’ve organized the data so that now we have a table called &lt;code>Orange_summ&lt;/code>, which shows the mean and standard deviation of the tree circumferences for each age group. (To run the code below, just make sure that the &lt;code>'dplyr'&lt;/code> package is installed if not already):&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># install.packages(&amp;#34;dplyr&amp;#34;)&lt;/span>
&lt;span style="color:#268bd2">library&lt;/span>(dplyr)
Orange_summ &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">group_by&lt;/span>(Orange, &lt;span style="color:#2aa198">&amp;#34;days&amp;#34;&lt;/span>&lt;span style="color:#719e07">=&lt;/span>age) &lt;span style="color:#719e07">%&amp;gt;%&lt;/span>
&lt;span style="color:#268bd2">summarize&lt;/span>(mean_circ_mm &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">mean&lt;/span>(circumference), sd_circ_mm &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">round&lt;/span>(&lt;span style="color:#268bd2">sd&lt;/span>(circumference), &lt;span style="color:#2aa198">2&lt;/span>))
Orange_summ
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 7 × 3
## days mean_circ_mm sd_circ_mm
## &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;
## 1 118 31 1.41
## 2 484 57.8 8.17
## 3 664 93.2 17.2
## 4 1004 134. 25.9
## 5 1231 146. 29.2
## 6 1372 173. 32.8
## 7 1582 176. 33.3
&lt;/code>&lt;/pre>&lt;div class="alert alert-note">
&lt;div>
If you’re interested in learning more about how to summarize data like this, check out our full online course, &lt;a href="https://www.rforecology.com/" target="_blank">“The Basics of R (for ecologists)” here.&lt;/a>
&lt;/div>
&lt;/div>
&lt;p>Great! Now we have a summary table that we can export to Word. First, we’re going to save our table as a ‘*.csv’ file.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">write.csv&lt;/span>(Orange_summ, &lt;span style="color:#2aa198">&amp;#34;Orange_summ.csv&amp;#34;&lt;/span>, row.names &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">F&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>What’s important to note here is that we set &lt;code>row.names&lt;/code> to False—doing this eliminates the row numbers in our .csv file, since we don’t need them.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/tablesinwordPhoto1.png" alt="a screenshot from R Studio showing the contents of the Orange_summ dataframe with the row names circled in red">&lt;/p>
&lt;p>Next, open the .csv file. You can see below that Microsoft Excel is the default software for opening .csv files, but we don’t want that. We’re going to open the file in TextEdit or a similar text editor by right-clicking on our file and choosing the appropriate app.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/tablesinwordPhoto2.png" alt="a screenshot from the Mac OS Finder application showing how to use &amp;ldquo;open with&amp;rdquo; for opening the csv file in a text editor">&lt;/p>
&lt;p>It should look something like this.&lt;/p>
&lt;img src="https://www.rforecology.com/tablesinwordPhoto3.png" alt="a screenshot of the text editor showing the contents of the .csv file" style="width:500px;"/>
&lt;p>After opening the .csv file in your text editor app, just copy and paste the text onto a blank Microsoft Word document.&lt;/p>
&lt;p>In Word, highlight the text, and then go to Table &amp;raquo; Convert &amp;raquo; Convert Text to Table&amp;hellip;&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/tablesinwordPhoto4.png" alt="a screenshot from MS word showing how to go to Table &amp;raquo; Convert &amp;raquo; Convert Text to Table&amp;hellip;">&lt;/p>
&lt;p>That will open a window where you should check that the number of columns is correct, and make sure you have chosen “Commas” in the “Separate text at” section. That’s because you saved the file as a .csv, or “comma-separated values” file.&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
If you have a Windows computer, the exact method for converting text to tables might be slightly different, but the concept is the same—you can find a tutorial for that &lt;a href="https://support.microsoft.com/en-us/office/convert-text-to-a-table-or-a-table-to-text-b5ce45db-52d5-4fe3-8e9c-e04b62f189e1#:~:text=Select%20the%20text%20that%20you,columns%20and%20rows%20you%20want" target="_blank">here.&lt;/a>
&lt;/div>
&lt;/div>
&lt;img src="https://www.rforecology.com/tablesinwordPhoto5.png" alt="a screenshot from MS word showing the settings for converting text to table" style="width:400px;"/>
&lt;p>Click “OK” and we have a table!&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/tablesinwordPhoto6.png" alt="screenshot from MS Word showing the converted table">&lt;/p>
&lt;p>Next, use the “Find and Replace” function to clean up the table by going to Edit &amp;raquo; Find &amp;raquo; Replace.
(The Mac keyboard shortcut for this is Shift + Command + H).&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/tablesinwordPhoto7.png" alt="screenshot from MS Word showing how to find &amp;ldquo;find and replace&amp;rdquo; for cleaning up the table">&lt;/p>
&lt;p>We want to get rid of all the double quotes in our table, so put double quotes “ in the top bar, and leave the bottom bar blank. Then click “Replace all”. Word should have found 6 replacements. This is definitely something that could have been fixed manually in this case since there are only 6 occurrences, but if your table contains a character or factor column, all the values in that column will end up having double quotes around them, so that&amp;rsquo;s where this trick comes in handy&amp;hellip;&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/tablesinwordPhoto8.png" alt="screenshot from MS Word showing how to use Find and Replace">&lt;/p>
&lt;p>Looking good!&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/tablesinwordPhoto9.png" alt="Screenshot from MS Word showing a cleaner table">&lt;/p>
&lt;p>Now just rename the columns and reformat the table to make it nice and polished. Word has several border editing tools that allow you to change which borders are visible. I like to remove all borders first. Then, by putting your cursor in a table cell, you can go to Table Design &amp;raquo; Border Painter, which lets you &amp;ldquo;paint&amp;rdquo; in whichever borders you do want to add.&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/tablesinwordPhoto10.png" alt="Screenshot from MS Word showing the final formatted and clean table">&lt;/p>
&lt;p>And that’s it! You’ve just exported your first table from R into Microsoft Word.&lt;/p>
&lt;br>
&lt;hr>
&lt;center>If you liked this post and want to learn more, then check out my online course on the complete basics of R for ecology:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://www.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>Video tutorial on the essentials of R for ecology cheatsheet</title><link>https://www.rforecology.com/post/video-tutorial-on-the-essentials-of-r-for-ecology-cheat-sheet/</link><pubDate>Fri, 10 Sep 2021 09:30:39 -0400</pubDate><guid>https://www.rforecology.com/post/video-tutorial-on-the-essentials-of-r-for-ecology-cheat-sheet/</guid><description>&lt;p>Hey everyone! I just finished putting together a video tutorial that goes over my Essential Functions of R (for ecology) Cheatsheet. I decided to create a separate post here because some of you were asking for an easy walk-through of the functions on the cheatsheet and I think that merits its own post. For those that are ready to just download the cheatsheet and go running with it, &lt;a href="https://www.rforecology.com/post/the-essential-functions-of-r-cheatsheet/" target="_blank">here is the link to my original post on the subject.&lt;/a>&lt;/p>
&lt;p>👇 &lt;strong>&lt;em>&lt;em>Download the Cheatsheet here&lt;/em>&lt;/em>&lt;/strong> 👇
&lt;br>
&lt;strong>&lt;a href="https://rforecology.activehosted.com/f/25" target="_blank">Click here to download the Essential Functions of R Cheatsheet.&lt;/a>&lt;/strong>&lt;/p>
&lt;p>The cheatsheet is still a work in progress, but for now the video goes over my first version. I thought this is also a good opportunity to go over some of the questions and suggestions I&amp;rsquo;ve received since publishing this first version. More on this towards the bottom of this post.&lt;/p>
&lt;p>But first, here is a link to the video:&lt;/p>
&lt;p>&lt;a href="https://youtu.be/dQe3Z7hRG1s" target="_blank" rel="noopener">&lt;img src="https://www.rforecology.com/cheatsheet_walkthrough_thumbnail.png" alt="Video thumbnail of tutorial on the essentials of R cheatsheet. Words say &amp;ldquo;80% of R in one hour?!"">&lt;/a>&lt;/p>
&lt;p>And here is the starting code that I use (that you can copy and paste) for following along in the tutorial:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Starting Code (contains most of the data used for this tutorial):&lt;/span>
num_vec &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">3&lt;/span>,&lt;span style="color:#2aa198">6&lt;/span>,&lt;span style="color:#2aa198">3&lt;/span>,&lt;span style="color:#2aa198">8&lt;/span>)
spp_vec &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;spp1&amp;#34;&lt;/span>,&lt;span style="color:#2aa198">&amp;#34;spp3&amp;#34;&lt;/span>,&lt;span style="color:#2aa198">&amp;#34;spp2&amp;#34;&lt;/span>,&lt;span style="color:#2aa198">&amp;#34;spp3&amp;#34;&lt;/span>)
dataframe &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">data.frame&lt;/span>(num_vec, spp_vec)
&lt;span style="color:#268bd2">data&lt;/span>(trees)
tree_data &lt;span style="color:#719e07">&amp;lt;-&lt;/span> trees
tree_data&lt;span style="color:#719e07">$&lt;/span>light &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#268bd2">rep&lt;/span>(&lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;shade&amp;#34;&lt;/span>,&lt;span style="color:#2aa198">&amp;#34;sun&amp;#34;&lt;/span>), each&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">15&lt;/span>), &lt;span style="color:#2aa198">&amp;#34;sun&amp;#34;&lt;/span>)
tree_data&lt;span style="color:#719e07">$&lt;/span>light &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">as.factor&lt;/span>(tree_data&lt;span style="color:#719e07">$&lt;/span>light)
my_matrix &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">as.matrix&lt;/span>(dataframe)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>If helpful, you can also download the entire script that I wrote out over the course of the tutorial:
&lt;b>&lt;a href="https://www.rforecology.com/uploads/cheatsheet_v1_tutorial_script.R" target="_blank">Click here to download the entire script from the video tutorial/walkthrough on the essential functions of R cheatsheet.&lt;/a>&lt;/b>&lt;/p>
&lt;p>If this is what you came for, then you can ignore the rest of this post. (more advanced R users might want to keep reading)&lt;/p>
&lt;h3 id="some-notes-on-two-of-the-more-common-suggestions-ive-received">Some notes on two of the more common suggestions I&amp;rsquo;ve received:&lt;/h3>
&lt;blockquote>
&lt;p>Watch out with setwd() or Jenny Bryan will burn your computer down 😜 &lt;a href="https://www.tidyverse.org/blog/2017/12/workflow-vs-script/">https://www.tidyverse.org/blog/2017/12/workflow-vs-script/&lt;/a>
&lt;strong>— Eric Scott&lt;/strong>&lt;/p>
&lt;/blockquote>
&lt;p>I&amp;rsquo;ve already gotten some version of this comment several times. The idea is that &lt;code>setwd()&lt;/code> is a function that should rarely (if ever) be used. &lt;code>setwd()&lt;/code> allows you to set the working directory so that when you upload your data (or save your results) you can set where that base directory is. The problem is that you have to specify the entire path when using &lt;code>setwd()&lt;/code> which makes it only applicable to your own computer (at that moment in time!). How many of you have opened an R script with the following code:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">setwd&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;/Users/lukanegoita/Documents/my_special_folder/another_folder/final_folder&amp;#34;&lt;/span>)
&lt;span style="color:#268bd2">read.csv&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;my_data.csv&amp;#34;&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Only to get the error message:
&lt;code>In file(file, &amp;quot;rt&amp;quot;) : cannot open file 'my_data.csv': No such file or directory&lt;/code>, and the only reason you get this error is because at some point or another you moved the R script to a different folder or changed some folder names and now you have no idea where that CSV is (or best case scenario it takes you a while to find it again)&amp;hellip; Another common reason this happens is when sharing scripts. Someone else&amp;rsquo;s computer will have a totally different file path than yours. To prevent this error and for good coding / sharing practices, it&amp;rsquo;s very important to use R Studio Projects for managing all of your scripts. It&amp;rsquo;s beyond the scope of this post to explain how that works, &lt;a href="https://www.rforecology.com/post/organizing-your-r-studio-projects/" target="_blank">but you can check out my older post where I explain this in more detail (along with some links to other good articles on the subject).&lt;/a>&lt;/p>
&lt;p>This is all to say that the only reason I included &lt;code>setwd()&lt;/code> in this cheatsheet is because many beginners will still find this function in their code, usually from people sharing their scripts without adhering to the best practice of using Projects instead. I think this is my new thing: Don&amp;rsquo;t share scripts, share projects. Stop the spread of STWDs (Scriptually Transmitted Working Directories).&lt;/p>
&lt;p>Ok, enough on that.&lt;/p>
&lt;p>Second, I&amp;rsquo;ve gotten comments on why I didn&amp;rsquo;t include any of the &amp;ldquo;apply&amp;rdquo; category functions (such as &lt;code>lapply()&lt;/code>, &lt;code>tapply()&lt;/code>, &lt;code>vapply()&lt;/code>, &lt;code>sapply()&lt;/code> and just &lt;code>apply()&lt;/code>). It&amp;rsquo;s true that those functions may creep up every once in a while, and they are no doubt a powerful set of tools for working with data. However, I have always been thoroughly confused by the multitude of different &amp;ldquo;apply&amp;rdquo; functions and not knowing where to use which one. Discovering the dplyr &lt;code>group_by()&lt;/code> and &lt;code>summarize()&lt;/code> functions made it so that I (almost) never have to use the &amp;ldquo;apply&amp;rdquo; functions now. To prevent others from going through the same frustration I went through, I just decided to omit that family of functions and stick to the few key dplyr functions I did include. The point is that I&amp;rsquo;ve been able to do &lt;em>most&lt;/em> of my work without needing to use &amp;ldquo;apply&amp;rdquo; functions, so I think others can too.&lt;/p>
&lt;p>Convince me otherwise and I&amp;rsquo;ll include them in a future version of the cheatsheet 😉&lt;/p>
&lt;p>That&amp;rsquo;s it for now, but comment down below to keep the conversation going! I hope this cheatsheet evolves into the most helpful resource that it can be.&lt;/p>
&lt;br>
&lt;hr>
&lt;center>If you liked this post and want to learn more, then check out my online course on the complete basics of R for ecology:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://www.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>Intro to evolutionary algorithms with R for beginners (from scratch) [PART 1]</title><link>https://www.rforecology.com/post/intro-to-evolutionary-algorithms-with-r-for-beginners-from-scratch-part-1/</link><pubDate>Fri, 27 Aug 2021 00:00:00 +0000</pubDate><guid>https://www.rforecology.com/post/intro-to-evolutionary-algorithms-with-r-for-beginners-from-scratch-part-1/</guid><description>&lt;p>Evolution by natural selection is a powerful process that leads to (and continues to shape) the wonderful diversity of organisms on Earth today. It is fairly simple in its mechanics—essentially an optimization routine that allows species to evolve to an ever-changing environment:&lt;/p>
&lt;figure id="figure-basic-flowchart-of-biological-evolution-by-natural-selection">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://www.rforecology.com/evo_flow.png" alt="Flowchart showing the process of evolution by natural selection. Going from natural selection of fit individuals, to the survivors reproducing and mutations occuring, and then the new population is formed and finally arrow goes back to natural selection of fit individuals and cycle restarts." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Basic flowchart of biological evolution by natural selection
&lt;/figcaption>&lt;/figure>
&lt;p>In this intro series of posts on the basics, I want to show you how you can use the same evolutionary optimization algorithm to &amp;lsquo;evolve&amp;rsquo; (optimize) solutions to other problems. Using evolutionary algorithms to solve problems is very powerful—just think of how many different solutions to flight have been reached through biological evolution.&lt;/p>
&lt;figure id="figure-bird-bat-and-insect-wings-from-httpsscholarblogsemoryeduartsbrain20200920the-biggest-mystery-in-evolution-the-origin-of-insect-flight">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://www.rforecology.com/bat-bird-insect.jpg" alt="images showing the anatomy of bat, bird, and dragonfly insect wings" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Bird, Bat, and Insect wings (from &lt;a href="https://scholarblogs.emory.edu/artsbrain/2020/09/20/the-biggest-mystery-in-evolution-the-origin-of-insect-flight/">https://scholarblogs.emory.edu/artsbrain/2020/09/20/the-biggest-mystery-in-evolution-the-origin-of-insect-flight/&lt;/a>)
&lt;/figcaption>&lt;/figure>
&lt;p>There are all sorts of really cool applications for evolutionary algorithms, including some fun ways of simulating biological evolution itself. &lt;a href="http://www.ventrella.com/" target="_blank">Jeffrey Ventrella&lt;/a> is an algorithmic artist that uses various types of coded algorithms to create some really outstanding works of art. Of particular interest for me as an ecologist and biologist was his creation of Gene Pool—an artificial life ecosystem of swimming creatures that slowly evolve through natural selection. Check out &lt;a href="http://www.swimbots.com/" target="_blank">the Swimbots website&lt;/a> to learn more about that and see his simulation in action. Full disclosure, I&amp;rsquo;ve been recently collaborating with him on some really interesting projects aimed at studying the evolution and co-existence of swimbot creatures in Gene Pool (&lt;a href="https://youtu.be/jS5dfhs-KR8" target="_blank">click here to see a video about our latest work&lt;/a>).&lt;/p>
&lt;figure id="figure-screenshot-of-the-browser-run-gene-pool-simulation-from-httpwwwswimbotscom">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://www.rforecology.com/genepool-screenshot.png" alt="Screenshot of Gene Pool simulation showing swimbot organisms on the left and toolbars for the simulation on the right" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Screenshot of the browser-run Gene Pool simulation from &lt;a href="http://www.swimbots.com">http://www.swimbots.com&lt;/a>
&lt;/figcaption>&lt;/figure>
&lt;p>For now, let&amp;rsquo;s back down a bit and focus on something somewhat simpler than virtual swimmers or flight&amp;hellip; We&amp;rsquo;ll start with how to use evolutionary algorithms for fitting a basic linear model. We&amp;rsquo;ll use the same dataset that I went over in &lt;a href="https://www.rforecology.com/post/how-to-do-simple-linear-regression-in-r/" target="_blank">my tutorial for basic linear regression&lt;/a> so that you can compare the two methods.&lt;/p>
&lt;p>The basic evolutionary algorithm we use is very similar to the biological algorithm of evolution by natural selection, but I&amp;rsquo;ll expand it a bit in more detail and explain each step. I&amp;rsquo;ll note that there are some packages and functions built for running evolutionary algorithms in R, but I want to show you how it&amp;rsquo;s done from scratch so that you can understand the mechanics more directly.&lt;/p>
&lt;figure id="figure-basic-form-of-the-evolutionary-algorithm">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://www.rforecology.com/evo_flow_algo.png" alt="flowchart showing how evolutionary algorithms begin by generating a population, then the fitness of each individual is evaluated, then the survivors are mated and mutations are applied to their offspring, and this finally creates a new generation. An arrow from their loops back to evaluating the fitness of each indivudual again" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Basic form of the evolutionary algorithm
&lt;/figcaption>&lt;/figure>
&lt;p>To start, let&amp;rsquo;s load the data we are working with first using the &lt;code>data()&lt;/code> function.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Load the data:&lt;/span>
&lt;span style="color:#268bd2">data&lt;/span>(trees)
&lt;span style="color:#586e75"># rename columns&lt;/span>
&lt;span style="color:#268bd2">names&lt;/span>(trees) &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;DBH_in&amp;#34;&lt;/span>,&lt;span style="color:#2aa198">&amp;#34;height_ft&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;volume_ft3&amp;#34;&lt;/span>)
&lt;span style="color:#586e75"># Show the top few entries:&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(trees)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## DBH_in height_ft volume_ft3
## 1 8.3 70 10.3
## 2 8.6 65 10.3
## 3 8.8 63 10.2
## 4 10.5 72 16.4
## 5 10.7 81 18.8
## 6 10.8 83 19.7
&lt;/code>&lt;/pre>&lt;p>These data include measurements of the diameter at breast height in inches (DBH_in), height in feet (height_ft) and volume in feet cubed (volume_ft3) of 31 black cherry trees. I just went ahead and renamed those columns for clarity.&lt;/p>
&lt;p>For this tutorial (&lt;a href="https://www.rforecology.com/post/how-to-do-simple-linear-regression-in-r/" target="_blank">as with my tutorial on simple linear regression&lt;/a>), our goal is to model the association between tree height and diameter.&lt;/p>
&lt;h3 id="dbh_in--height_ft">&lt;code>DBH_in ~ height_ft&lt;/code>&lt;/h3>
&lt;p>A quick plot shows that there is probably a relationship there:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">plot&lt;/span>(DBH_in &lt;span style="color:#719e07">~&lt;/span> height_ft, data &lt;span style="color:#719e07">=&lt;/span> trees, pch&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">16&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.5&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/intro-to-evolutionary-algorithms-with-r-for-beginners-from-scratch-part-1/index_files/figure-html/unnamed-chunk-2-1.png" width="672" />&lt;/p>
&lt;p>So our goal is to determine the coefficients of the relationship between height and DBH. In other words, we want to find the best-fit line for this regression.&lt;/p>
&lt;p>The actual function for the line we want is &lt;em>&lt;strong>Y = a + b*X&lt;/strong>&lt;/em>, where Y is &lt;code>DBH_in&lt;/code>, X is &lt;code>height_ft&lt;/code>, &amp;lsquo;a&amp;rsquo; is the intercept, and &amp;lsquo;b&amp;rsquo; is the slope of the line. Our goal is to &amp;lsquo;evolve&amp;rsquo; a solution for the &amp;lsquo;a&amp;rsquo; and &amp;lsquo;b&amp;rsquo; coefficients that generate the best-fit line.&lt;/p>
&lt;h2 id="1-starting-population">1) Starting Population&lt;/h2>
&lt;p>First we&amp;rsquo;ll create a function that generates a starting population of 100 potential organisms (models):&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">set.seed&lt;/span>(&lt;span style="color:#2aa198">123&lt;/span>) &lt;span style="color:#586e75"># to get the same results as me&lt;/span>
&lt;span style="color:#586e75"># 100 random &amp;#39;a&amp;#39; values based on a uniform distribution from -100 to 100&lt;/span>
a_coef &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">runif&lt;/span>(min&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">-100&lt;/span>, max&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">100&lt;/span>, n&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">100&lt;/span>)
&lt;span style="color:#586e75"># 100 random &amp;#39;b&amp;#39; values based on a uniform distribution from -100 to 100&lt;/span>
b_coef &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">runif&lt;/span>(min&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">-100&lt;/span>, max&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">100&lt;/span>, n&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">100&lt;/span>)
&lt;span style="color:#586e75"># pair these together into a dataframe of two columns and call this a population&lt;/span>
&lt;span style="color:#586e75"># and also add in a column for fitness so that we can keep track of the fitness&lt;/span>
&lt;span style="color:#586e75"># of each organism/model:&lt;/span>
population &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">data.frame&lt;/span>(a_coef, b_coef, fitness&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#cb4b16">NA&lt;/span>)
&lt;span style="color:#268bd2">head&lt;/span>(population)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## a_coef b_coef fitness
## 1 -42.48450 19.997792 NA
## 2 57.66103 -33.435292 NA
## 3 -18.20462 -2.277393 NA
## 4 76.60348 90.894765 NA
## 5 88.09346 -3.419521 NA
## 6 -90.88870 78.070044 NA
&lt;/code>&lt;/pre>&lt;p>Now put this code into its own function:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">gen_starting_pop &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">function&lt;/span>(){
&lt;span style="color:#586e75"># 100 random &amp;#39;a&amp;#39; values based on a uniform distribution from -100 to 100&lt;/span>
a_coef &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">runif&lt;/span>(min&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">-100&lt;/span>, max&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">100&lt;/span>, n&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">100&lt;/span>)
&lt;span style="color:#586e75"># 100 random &amp;#39;b&amp;#39; values based on a uniform distribution from -100 to 100&lt;/span>
b_coef &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">runif&lt;/span>(min&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">-100&lt;/span>, max&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">100&lt;/span>, n&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">100&lt;/span>)
&lt;span style="color:#586e75"># pair these together into a dataframe of two columns and call this a population&lt;/span>
&lt;span style="color:#586e75"># and also add in a column for fitness so that we can keep track of the fitness&lt;/span>
&lt;span style="color:#586e75"># of each organism/model:&lt;/span>
population &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">data.frame&lt;/span>(a_coef, b_coef, fitness&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#cb4b16">NA&lt;/span>)
&lt;span style="color:#268bd2">return&lt;/span>(population)
}
&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="2-function-to-evaluate-fitness">2) Function to evaluate fitness&lt;/h2>
&lt;p>Next we need to create a function that will evaluate the &amp;lsquo;fitness&amp;rsquo; of each &amp;lsquo;organism&amp;rsquo; (I&amp;rsquo;ll just refer to the organisms as the &amp;lsquo;models&amp;rsquo; from here on). The fitness we need in this case is based on how good of a fit each model provides. To think in terms of biological evolution, imagine that the line that the coefficients create is the &lt;em>phenotype&lt;/em> while the coefficients themselves are the &lt;em>DNA&lt;/em>. Natural selection always acts on the phenotype, which we can create from the DNA by applying the full model I described before: &lt;code>DBH_in = a + b*height_ft&lt;/code>. We can evaluate the fitness by testing how well the right side of the equation predicts the left side (DBH_in).&lt;/p>
&lt;p>So first we&amp;rsquo;ll loop through each model in the population to calculate how well it can predict DBH values. Then we can subtract the real DBH values from the predicted values to get the net difference between the two. Bigger differences means poor model and lower fitness. To create just one &amp;lsquo;fitness&amp;rsquo; value per model, we can then square and sum those difference values together. This generates an overall index of how different the predicted DBH is from the real DBH. The reason we square the values is to count negative and positive differences in the same way (only absolute difference is what matters). Finally to make the fitness value range from 0 to 1, with 1 being the perfect fitness (which is actually impossible in our case), simply inverse the result (1/result):&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># loop through each organism:&lt;/span>
&lt;span style="color:#268bd2">for&lt;/span>(i in &lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">100&lt;/span>){
&lt;span style="color:#586e75"># for each organism, calculate predicted DBH values&lt;/span>
DBH_predicted &lt;span style="color:#719e07">=&lt;/span> population[i,&lt;span style="color:#2aa198">&amp;#34;a_coef&amp;#34;&lt;/span>] &lt;span style="color:#719e07">+&lt;/span> population[i,&lt;span style="color:#2aa198">&amp;#34;b_coef&amp;#34;&lt;/span>]&lt;span style="color:#719e07">*&lt;/span>trees&lt;span style="color:#719e07">$&lt;/span>height_ft
&lt;span style="color:#586e75"># Now, subtract the real DBH values from the predicted values to get the difference:&lt;/span>
difference &lt;span style="color:#719e07">&amp;lt;-&lt;/span> DBH_predicted &lt;span style="color:#719e07">-&lt;/span> trees&lt;span style="color:#719e07">$&lt;/span>DBH_in
&lt;span style="color:#586e75"># calculate the sum of squared differences:&lt;/span>
sum_sq_diff &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">sum&lt;/span>(difference^2)
&lt;span style="color:#586e75"># make the fitness value range from 0 to 1:&lt;/span>
fitness &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">/&lt;/span>sum_sq_diff
&lt;span style="color:#586e75"># finally, save the fitness values to the population dataframe:&lt;/span>
population&lt;span style="color:#719e07">$&lt;/span>fitness[i] &lt;span style="color:#719e07">&amp;lt;-&lt;/span> fitness
}
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Then we need to choose survivors. Let&amp;rsquo;s say only the top 10 models with the highest fitness will survive:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># find the index of the top 10 (highest) fitness values:&lt;/span>
top_10_fit &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">order&lt;/span>(population&lt;span style="color:#719e07">$&lt;/span>fitness, decreasing &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">T&lt;/span>)[1&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">10&lt;/span>]
&lt;span style="color:#586e75"># then use those to index the population:&lt;/span>
survivors &lt;span style="color:#719e07">&amp;lt;-&lt;/span> population[top_10_fit,]
survivors
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## a_coef b_coef fitness
## 61 33.02304 0.4599127 1.073925e-05
## 52 -11.55999 -0.4945466 8.224736e-06
## 5 88.09346 -3.4195206 9.275710e-07
## 3 -18.20462 -2.2773933 7.663413e-07
## 35 -95.07726 4.2271452 7.017355e-07
## 86 -13.02145 -3.7420399 3.320522e-07
## 78 22.55420 5.9671372 1.497032e-07
## 66 -10.29673 6.7375891 1.342400e-07
## 96 -62.46178 -6.6934595 9.392552e-08
## 17 -50.78245 9.8569312 6.820099e-08
&lt;/code>&lt;/pre>&lt;p>But let&amp;rsquo;s also put this code that calculates fitness and picks survivors into its own function to make it easier to write out in future steps:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">evaluate_fitness &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">function&lt;/span>(population){
&lt;span style="color:#586e75"># loop through each organism:&lt;/span>
&lt;span style="color:#268bd2">for&lt;/span>(i in &lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">100&lt;/span>){
&lt;span style="color:#586e75"># for each organism, calculate predicted DBH values&lt;/span>
DBH_predicted &lt;span style="color:#719e07">=&lt;/span> population[i,&lt;span style="color:#2aa198">&amp;#34;a_coef&amp;#34;&lt;/span>] &lt;span style="color:#719e07">+&lt;/span> population[i,&lt;span style="color:#2aa198">&amp;#34;b_coef&amp;#34;&lt;/span>]&lt;span style="color:#719e07">*&lt;/span>trees&lt;span style="color:#719e07">$&lt;/span>height_ft
&lt;span style="color:#586e75"># Now, subtract the real DBH values from the predicted values to get the difference:&lt;/span>
difference &lt;span style="color:#719e07">&amp;lt;-&lt;/span> DBH_predicted &lt;span style="color:#719e07">-&lt;/span> trees&lt;span style="color:#719e07">$&lt;/span>DBH_in
&lt;span style="color:#586e75"># calculate the sum of squared differences:&lt;/span>
sum_sq_diff &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">sum&lt;/span>(difference^2)
&lt;span style="color:#586e75"># make the fitness value range from 0 to 1:&lt;/span>
fitness &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">/&lt;/span>sum_sq_diff
&lt;span style="color:#586e75"># finally, save the fitness values to the population dataframe:&lt;/span>
population&lt;span style="color:#719e07">$&lt;/span>fitness[i] &lt;span style="color:#719e07">&amp;lt;-&lt;/span> fitness
}
&lt;span style="color:#586e75"># find the index value the top 10 (highest) fitness values:&lt;/span>
top_10_fit &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">order&lt;/span>(population&lt;span style="color:#719e07">$&lt;/span>fitness, decreasing &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">T&lt;/span>)[1&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">10&lt;/span>]
&lt;span style="color:#586e75"># then use those to index the population:&lt;/span>
survivors &lt;span style="color:#719e07">&amp;lt;-&lt;/span> population[top_10_fit,]
&lt;span style="color:#586e75"># return survivors&lt;/span>
&lt;span style="color:#268bd2">return&lt;/span>(survivors)
}
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Now, all we have to do is call: &lt;br>
&lt;code>population &amp;lt;- evaluate_fitness(population)&lt;/code> &lt;br>
whenever we want to evaluate the survivors of the population.&lt;/p>
&lt;h2 id="3-mate-survivors-and-mutate-dna-and-generate-the-new-population">3) Mate survivors and mutate DNA (and generate the new population)&lt;/h2>
&lt;p>Next we need to create a new population of 100 models using those survivors, making sure to add some random mutations to ensure the potential for evolution exists.&lt;/p>
&lt;p>First, generate the new population of models by cloning these survivors at random:&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
Note, I&amp;rsquo;m just cloning individuals rather than sexual reproduction here to keep things simple for this example. However, DNA crossover can provide an important advantage for evolutionary algorithms as it does in biology.
&lt;/div>
&lt;/div>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r"> &lt;span style="color:#586e75"># First, choose the parents at randome from the 10 possible survivors:&lt;/span>
parents &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">sample&lt;/span>(&lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">10&lt;/span>, size&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">100&lt;/span>, replace&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">T&lt;/span>)
&lt;span style="color:#586e75"># and use those parents (index values) to clone offspring:&lt;/span>
offspring &lt;span style="color:#719e07">&amp;lt;-&lt;/span> survivors[parents,]
&lt;span style="color:#586e75"># Then add mutations:&lt;/span>
&lt;span style="color:#586e75"># choose a mutation rate:&lt;/span>
mutation_rate &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#2aa198">0.6&lt;/span>
&lt;span style="color:#586e75"># total number of mutations is our population (100) * the rate, and rounded to &lt;/span>
&lt;span style="color:#586e75"># make sure the result is an integer value:&lt;/span>
total_mutations &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">round&lt;/span>(&lt;span style="color:#2aa198">100&lt;/span>&lt;span style="color:#719e07">*&lt;/span>mutation_rate)
&lt;span style="color:#586e75"># choose which models recieve mutations for a or b coefficients:&lt;/span>
a_to_mutate &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">sample&lt;/span>(x&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">100&lt;/span>), size&lt;span style="color:#719e07">=&lt;/span>total_mutations)
b_to_mutate &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">sample&lt;/span>(x&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">100&lt;/span>), size&lt;span style="color:#719e07">=&lt;/span>total_mutations)
&lt;span style="color:#586e75"># then generate a set of random mutations for the a and b coefficients:&lt;/span>
a_mutations &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">rnorm&lt;/span>(n &lt;span style="color:#719e07">=&lt;/span> total_mutations, mean&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">0&lt;/span>, sd&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">3&lt;/span>)
b_mutations &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">rnorm&lt;/span>(n &lt;span style="color:#719e07">=&lt;/span> total_mutations, mean&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">0&lt;/span>, sd&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">3&lt;/span>)
&lt;span style="color:#586e75"># and apply those mutations:&lt;/span>
offspring&lt;span style="color:#719e07">$&lt;/span>a_coef[a_to_mutate] &lt;span style="color:#719e07">&amp;lt;-&lt;/span> offspring&lt;span style="color:#719e07">$&lt;/span>a_coef[a_to_mutate] &lt;span style="color:#719e07">+&lt;/span> a_mutations
offspring&lt;span style="color:#719e07">$&lt;/span>b_coef[b_to_mutate] &lt;span style="color:#719e07">&amp;lt;-&lt;/span> offspring&lt;span style="color:#719e07">$&lt;/span>b_coef[b_to_mutate] &lt;span style="color:#719e07">+&lt;/span> b_mutations
&lt;span style="color:#586e75"># finally, reset the row names from 1 to 100:&lt;/span>
&lt;span style="color:#268bd2">row.names&lt;/span>(offspring) &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">100&lt;/span>
&lt;/code>&lt;/pre>&lt;/div>&lt;div class="alert alert-note">
&lt;div>
Note, I&amp;rsquo;m setting the mutation rate to 0.6 and the magnitude for the mutations to 3 (&lt;code>sd=3&lt;/code>), but you can play around with mutation rate and magnitude to see what happens.
&lt;/div>
&lt;/div>
&lt;p>But let&amp;rsquo;s also make this step into a function so that we can easily call it later when we are looping through each generation:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">mate_and_mutate &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">function&lt;/span>(survivors){
&lt;span style="color:#586e75"># create a series of 100 random indexes chosen from the survivors:&lt;/span>
&lt;span style="color:#586e75"># First, choose the parents at randome from the 10 possible survivors:&lt;/span>
parents &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">sample&lt;/span>(&lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">10&lt;/span>, size&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">100&lt;/span>, replace&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">T&lt;/span>)
&lt;span style="color:#586e75"># and use those parents (index values) to clone offspring:&lt;/span>
offspring &lt;span style="color:#719e07">&amp;lt;-&lt;/span> survivors[parents,]
&lt;span style="color:#586e75"># Then add mutations:&lt;/span>
&lt;span style="color:#586e75"># choose a mutation rate:&lt;/span>
mutation_rate &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#2aa198">0.6&lt;/span>
&lt;span style="color:#586e75"># total number of mutations is our population (100) * the rate, and rounded to &lt;/span>
&lt;span style="color:#586e75"># make sure the result is an integer value:&lt;/span>
total_mutations &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">round&lt;/span>(&lt;span style="color:#2aa198">100&lt;/span>&lt;span style="color:#719e07">*&lt;/span>mutation_rate)
&lt;span style="color:#586e75"># choose which models recieve mutations for a or b coefficients:&lt;/span>
a_to_mutate &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">sample&lt;/span>(x&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">100&lt;/span>), size&lt;span style="color:#719e07">=&lt;/span>total_mutations)
b_to_mutate &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">sample&lt;/span>(x&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">100&lt;/span>), size&lt;span style="color:#719e07">=&lt;/span>total_mutations)
&lt;span style="color:#586e75"># then generate a set of random mutations for the a and b coefficients:&lt;/span>
a_mutations &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">rnorm&lt;/span>(n &lt;span style="color:#719e07">=&lt;/span> total_mutations, mean&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">0&lt;/span>, sd&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">3&lt;/span>)
b_mutations &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">rnorm&lt;/span>(n &lt;span style="color:#719e07">=&lt;/span> total_mutations, mean&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">0&lt;/span>, sd&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">3&lt;/span>)
&lt;span style="color:#586e75"># and apply those mutations:&lt;/span>
offspring&lt;span style="color:#719e07">$&lt;/span>a_coef[a_to_mutate] &lt;span style="color:#719e07">&amp;lt;-&lt;/span> offspring&lt;span style="color:#719e07">$&lt;/span>a_coef[a_to_mutate] &lt;span style="color:#719e07">+&lt;/span> a_mutations
offspring&lt;span style="color:#719e07">$&lt;/span>b_coef[b_to_mutate] &lt;span style="color:#719e07">&amp;lt;-&lt;/span> offspring&lt;span style="color:#719e07">$&lt;/span>b_coef[b_to_mutate] &lt;span style="color:#719e07">+&lt;/span> b_mutations
&lt;span style="color:#586e75"># finally, reset the row names from 1 to 100:&lt;/span>
&lt;span style="color:#268bd2">row.names&lt;/span>(offspring) &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>&lt;span style="color:#2aa198">100&lt;/span>
&lt;span style="color:#586e75"># return the new generation of offspring:&lt;/span>
&lt;span style="color:#268bd2">return&lt;/span>(offspring)
}
&lt;/code>&lt;/pre>&lt;/div>&lt;p>So that was one generation, great! Now we need to repeat the cycle for many more generations. We&amp;rsquo;ll put everything together with the help of the &lt;code>for&lt;/code> loop.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># First set the starting population:&lt;/span>
population &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">gen_starting_pop&lt;/span>()
&lt;span style="color:#586e75"># set how many generations you want to run this for.&lt;/span>
&lt;span style="color:#586e75"># we&amp;#39;ll start with 5 for now:&lt;/span>
generations &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#2aa198">5&lt;/span>
&lt;span style="color:#586e75"># begin the for loop:&lt;/span>
&lt;span style="color:#268bd2">for&lt;/span>(i in &lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>generations){
&lt;span style="color:#586e75"># 1) Evaluate fitness:&lt;/span>
survivors &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">evaluate_fitness&lt;/span>(population)
&lt;span style="color:#586e75"># 2) Mate and mutate survivors to generate next generation:&lt;/span>
next_generation &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">mate_and_mutate&lt;/span>(survivors)
&lt;span style="color:#586e75"># 3) Redefine the population using the new generation:&lt;/span>
population &lt;span style="color:#719e07">&amp;lt;-&lt;/span> next_generation
}
survivors &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">evaluate_fitness&lt;/span>(population)
&lt;span style="color:#268bd2">head&lt;/span>(survivors)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## a_coef b_coef fitness
## 46 -13.872104 0.3557666 0.004382764
## 95 -14.649663 0.3592565 0.004170481
## 8 -3.179754 0.2292144 0.004046204
## 67 -5.458559 0.2292144 0.003732053
## 11 -15.186570 0.3557666 0.003467042
## 59 -12.313690 0.3557666 0.003383982
&lt;/code>&lt;/pre>&lt;p>Notice how even in just a few generations the fitness has gone up quite a bit. We can actually plot and visualize how the fitness changes over time by saving a fitness value from each generation:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">set.seed&lt;/span>(&lt;span style="color:#2aa198">1239&lt;/span>)
&lt;span style="color:#586e75"># First set the starting population:&lt;/span>
population &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">gen_starting_pop&lt;/span>()
&lt;span style="color:#586e75"># set how many generations you want to run this for.&lt;/span>
&lt;span style="color:#586e75"># Use 100 now:&lt;/span>
generations &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#2aa198">100&lt;/span>
&lt;span style="color:#586e75"># define empty variable to collect fitness values:&lt;/span>
fitness &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#cb4b16">NULL&lt;/span>
&lt;span style="color:#586e75"># begin the for loop:&lt;/span>
&lt;span style="color:#268bd2">for&lt;/span>(i in &lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>generations){
&lt;span style="color:#586e75"># 1) Evaluate fitness:&lt;/span>
survivors &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">evaluate_fitness&lt;/span>(population)
&lt;span style="color:#586e75"># 2) Mate and mutate survivors to generate next generation:&lt;/span>
next_generation &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">mate_and_mutate&lt;/span>(survivors)
&lt;span style="color:#586e75"># 3) Redefine the population using the new generation:&lt;/span>
population &lt;span style="color:#719e07">&amp;lt;-&lt;/span> next_generation
&lt;span style="color:#586e75"># save fitness value from each generation to plot it over time&lt;/span>
fitness[i] &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">max&lt;/span>(population&lt;span style="color:#719e07">$&lt;/span>fitness)
}
&lt;/code>&lt;/pre>&lt;/div>&lt;p>And plot the results during the first 100 generations:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">plot&lt;/span>(fitness &lt;span style="color:#719e07">~&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>generations), type&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;l&amp;#34;&lt;/span>, lwd&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>, xlab&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;Generation&amp;#34;&lt;/span>, ylab&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;Fitness&amp;#34;&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/intro-to-evolutionary-algorithms-with-r-for-beginners-from-scratch-part-1/index_files/figure-html/unnamed-chunk-12-1.png" width="672" />
So you can see how fitness slowly increases over time. Try this again but run the simulation for 1000 generations (note, that it may take a minute to run the for loop):&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/post/intro-to-evolutionary-algorithms-with-r-for-beginners-from-scratch-part-1/index_files/figure-html/unnamed-chunk-13-1.png" width="672" />
We can also visualize the evolution of the best-fit line by plotting how the line changes over time. The code below runs the first 100 generations and every 5 generations it pauses to plot the scatterplot with the predicted line:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">set.seed&lt;/span>(&lt;span style="color:#2aa198">1239&lt;/span>)
&lt;span style="color:#586e75"># First set the starting population:&lt;/span>
population &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">gen_starting_pop&lt;/span>()
&lt;span style="color:#586e75"># set how many generations you want to run this for.&lt;/span>
generations &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#2aa198">1000&lt;/span>
&lt;span style="color:#586e75"># define empty variable to collect fitness values:&lt;/span>
fitness &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#cb4b16">NULL&lt;/span>
&lt;span style="color:#586e75"># begin the for loop:&lt;/span>
&lt;span style="color:#268bd2">for&lt;/span>(i in &lt;span style="color:#2aa198">1&lt;/span>&lt;span style="color:#719e07">:&lt;/span>generations){
&lt;span style="color:#586e75"># 1) Evaluate fitness:&lt;/span>
survivors &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">evaluate_fitness&lt;/span>(population)
&lt;span style="color:#586e75"># 2) Mate and mutate survivors to generate next generation:&lt;/span>
next_generation &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">mate_and_mutate&lt;/span>(survivors)
&lt;span style="color:#586e75"># 3) Redefine the population using the new generation:&lt;/span>
population &lt;span style="color:#719e07">&amp;lt;-&lt;/span> next_generation
fitness[i] &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">max&lt;/span>(population&lt;span style="color:#719e07">$&lt;/span>fitness)
&lt;span style="color:#586e75">#Every 5 generations, pause the simulation and plot the&lt;/span>
&lt;span style="color:#586e75"># points with the current best-fit line:&lt;/span>
&lt;span style="color:#268bd2">if&lt;/span>(i &lt;span style="color:#719e07">%%&lt;/span> &lt;span style="color:#2aa198">50&lt;/span> &lt;span style="color:#719e07">==&lt;/span> &lt;span style="color:#2aa198">0&lt;/span>){
survivors &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">evaluate_fitness&lt;/span>(population)
&lt;span style="color:#268bd2">plot&lt;/span>(DBH_in &lt;span style="color:#719e07">~&lt;/span> height_ft, data &lt;span style="color:#719e07">=&lt;/span> trees, pch&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">16&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.5&lt;/span>)
&lt;span style="color:#268bd2">title&lt;/span>(main&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">paste0&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;Generation: &amp;#34;&lt;/span>,i), cex.main&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;span style="color:#268bd2">abline&lt;/span>(a&lt;span style="color:#719e07">=&lt;/span>survivors&lt;span style="color:#719e07">$&lt;/span>a_coef[1], b&lt;span style="color:#719e07">=&lt;/span>survivors&lt;span style="color:#719e07">$&lt;/span>b_coef[1], lwd&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">3&lt;/span>, col&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;red&amp;#34;&lt;/span>)
&lt;span style="color:#586e75"># pause for 1 second:&lt;/span>
&lt;span style="color:#268bd2">Sys.sleep&lt;/span>(&lt;span style="color:#2aa198">.5&lt;/span>)
}
}
&lt;/code>&lt;/pre>&lt;/div>
&lt;figure id="figure-evolution-of-the-best-fit-line">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://www.rforecology.com/line_evo_gif.gif" alt="GIF showing how the best-fit line on a scatterplot regression changes over the generations" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Evolution of the best-fit line
&lt;/figcaption>&lt;/figure>
&lt;p>Now let&amp;rsquo;s extract the highest fitness model after having run the simulation for 1000 generations to see how the coefficients compare to the basic linear model &lt;code>lm()&lt;/code> output in R:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Extract the best model:&lt;/span>
top_models &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">evaluate_fitness&lt;/span>(population) &lt;span style="color:#586e75"># Be sure to run this first&lt;/span>
best_model &lt;span style="color:#719e07">&amp;lt;-&lt;/span> top_models&lt;span style="color:#268bd2">[which.max&lt;/span>(top_models&lt;span style="color:#719e07">$&lt;/span>fitness),]
&lt;span style="color:#586e75"># compare this to a basic linear model result with &amp;#39;lm()&amp;#39;:&lt;/span>
evo_model &lt;span style="color:#719e07">&amp;lt;-&lt;/span> best_model
lm_model &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">lm&lt;/span>(DBH_in &lt;span style="color:#719e07">~&lt;/span> height_ft, data&lt;span style="color:#719e07">=&lt;/span>trees)
&lt;span style="color:#586e75"># Plot the scatterplot and add the best-fit lines:&lt;/span>
&lt;span style="color:#268bd2">plot&lt;/span>(DBH_in &lt;span style="color:#719e07">~&lt;/span> height_ft, data &lt;span style="color:#719e07">=&lt;/span> trees, pch&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">16&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.5&lt;/span>)
&lt;span style="color:#586e75"># red line for our model solution&lt;/span>
&lt;span style="color:#268bd2">abline&lt;/span>(a&lt;span style="color:#719e07">=&lt;/span>evo_model&lt;span style="color:#719e07">$&lt;/span>a_coef, b&lt;span style="color:#719e07">=&lt;/span>evo_model&lt;span style="color:#719e07">$&lt;/span>b_coef, col&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;red&amp;#34;&lt;/span>, lwd&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;span style="color:#586e75"># blue line for the lm() model solution:&lt;/span>
&lt;span style="color:#268bd2">abline&lt;/span>(lm_model, col&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;blue&amp;#34;&lt;/span>, lwd&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/intro-to-evolutionary-algorithms-with-r-for-beginners-from-scratch-part-1/index_files/figure-html/unnamed-chunk-15-1.png" width="672" />&lt;/p>
&lt;p>Not bad for only 1000 generations! Can barely even see the difference between the two lines.&lt;/p>
&lt;h3 id="conclusion">Conclusion:&lt;/h3>
&lt;p>&lt;strong>our model: a = -5.9599 b = 0.2528&lt;/strong>
&lt;br>
&lt;strong>lm() model: a = -6.1884 b = 0.2557&lt;/strong>&lt;/p>
&lt;p>So, those are the basics of how evolutionary algorithms work. I hope this part of the tutorial was helpful. In the next part I&amp;rsquo;m going to get a bit more advanced and show you how I used an evolutionary algorithm to create images made of letters such as the one below.&lt;/p>
&lt;figure id="figure-evolving-an-image-made-of-only-the-letter-r">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://www.rforecology.com/image%20to%20Rs.jpg" alt="On the left is a black and white portrait and on the right is the image re-created using only the letter &amp;#39;R&amp;#39;" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Evolving an image made of only the letter R&amp;hellip;
&lt;/figcaption>&lt;/figure>
&lt;br>
&lt;hr>
&lt;center>If you liked this post and want to learn more, then check out my online course on the complete basics of R for ecology:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://coaching.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning R now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>Five fun things you can do with R (Vol. 1)</title><link>https://www.rforecology.com/post/five-fun-things-you-can-do-with-r/</link><pubDate>Mon, 16 Aug 2021 00:00:00 +0000</pubDate><guid>https://www.rforecology.com/post/five-fun-things-you-can-do-with-r/</guid><description>&lt;p>I&amp;rsquo;ve been having fun with R through some side projects lately. One of which involves trying to use some machine learning and evolutionary algorithms to teach my computer to draw&amp;hellip; (I hope to share a post on that project before too long). But that got me thinking&amp;hellip; so I decided to start a series of posts on fun projects you can do with R that I&amp;rsquo;ve either written myself or found online to re-share. Thanks in advance to all the authors of these articles. Special thanks to Ryan Timpe for the inspiration for this post&lt;sup>1&lt;/sup> (also see his site: &lt;a href="http://www.ryantimpe.com/)">http://www.ryantimpe.com/)&lt;/a>.&lt;/p>
&lt;p>Yes, full disclosure I&amp;rsquo;m a bit of a geek when it comes to R (if you couldn&amp;rsquo;t guess already), but if you are just starting out maybe some of the ideas below will spark your interest about the possibilities. If you&amp;rsquo;re a more advanced R user, then maybe take a shot at completing some of these projects yourself.&lt;/p>
&lt;p>So here&amp;rsquo;s the first installment:&lt;/p>
&lt;figure id="figure-christmas-card-by-greta-gasparac">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://www.rforecology.com/christmas_tree_graph_Greta_Gasparac.png" alt="Christmas Card by Greta Gasparac" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Christmas Card by Greta Gasparac
&lt;/figcaption>&lt;/figure>
&lt;ol>
&lt;li>Make holiday or birthday cards. Yes, this might be the epitome of the geekiest gift you might have for someone, how cool would it be to share your love for R with your friends and family.
Greta Gasparac: &lt;a href="https://towardsdatascience.com/christmas-cards-81e7e1cce21c">https://towardsdatascience.com/christmas-cards-81e7e1cce21c&lt;/a>&lt;/li>
&lt;/ol>
&lt;figure id="figure-google-search-history-barplot-by-saul-buentello">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://www.rforecology.com/google_history_Saul_Buentello.png" alt="Google Search history barplot by Saul Buentello" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Google Search history barplot by Saul Buentello
&lt;/figcaption>&lt;/figure>
&lt;ol start="2">
&lt;li>Analyze your personal Google search history. What kind of cool patterns can you find and learn about yourself?
Saúl Buentello: &lt;a href="https://towardsdatascience.com/explore-your-activity-on-google-with-r-how-to-analyze-and-visualize-your-search-history-1fb74e5fb2b6">https://towardsdatascience.com/explore-your-activity-on-google-with-r-how-to-analyze-and-visualize-your-search-history-1fb74e5fb2b6&lt;/a>&lt;/li>
&lt;/ol>
&lt;figure id="figure-example-datasaur-tweet-by-ryan-timpe">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://www.rforecology.com/dino_tweat_Ryan_Timpe.png" alt="Example Datasaur tweet by Ryan Timpe" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Example Datasaur tweet by Ryan Timpe
&lt;/figcaption>&lt;/figure>
&lt;ol start="3">
&lt;li>Build a twitterbot that &amp;lsquo;creates&amp;rsquo; dinasaurs. Yes. Exactly as that sounds.
Ryan Timpe: &lt;a href="http://www.ryantimpe.com/post/datasaurs1/">http://www.ryantimpe.com/post/datasaurs1/&lt;/a>&lt;/li>
&lt;/ol>
&lt;figure id="figure-example-kerasaur-phylogeny-by-ryan-timpe">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://www.rforecology.com/kerasaur_tree_Ryan_Timpe.png" alt="Example Kerasaur phylogeny by Ryan Timpe" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Example Kerasaur phylogeny by Ryan Timpe
&lt;/figcaption>&lt;/figure>
&lt;ol start="4">
&lt;li>Use deep learning to create new dinasaur names. This is connected to the previous idea, but really cool! There are many ways you could apply this to other topics and themes.
Ryan Timpe: &lt;a href="https://www.ryantimpe.com/post/kerasaurs1/">https://www.ryantimpe.com/post/kerasaurs1/&lt;/a>&lt;/li>
&lt;/ol>
&lt;figure id="figure-example-amazon-book-purchase-history">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://www.rforecology.com/amazon-books-plot-1.png" alt="Example Amazon book purchase history" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Example Amazon book purchase history
&lt;/figcaption>&lt;/figure>
&lt;ol start="5">
&lt;li>Analyze your Amazon shopping history. Here&amp;rsquo;s an older post of mine about how you can visualize your Amazon purchase history and maybe even draw some insights about yourself. WARNING: You might prefer not to see how much money you&amp;rsquo;ve been spending on all those purchases&amp;hellip; 😆
&lt;a href="https://lukaneg.github.io/personal-scrape.html">https://lukaneg.github.io/personal-scrape.html&lt;/a>&lt;/li>
&lt;/ol>
&lt;p>&lt;em>Have you come across or written a post about doing something fun with R? let me know! I&amp;rsquo;ll try to share it in an upcoming post. Just comment down below.&lt;/em>&lt;/p>
&lt;p>Footnote 1: &lt;a href="https://www.youtube.com/watch?v=oOG-aXP_ICI" target="_blank" rel="noopener">Check out Ryan Timpe&amp;rsquo;s presentation about how side projects are a great way to learn on your own terms, practice, and have fun while doing so!&lt;/a>&lt;/p>
&lt;br>
&lt;hr>
&lt;center>If you liked this post and want to learn more, then check out my online course on the complete basics of R for ecology:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://coaching.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning R now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>How to actually make a quality scatterplot in R</title><link>https://www.rforecology.com/post/how-to-make-a-quality-scatterplot-in-r/</link><pubDate>Fri, 06 Aug 2021 00:00:00 +0000</pubDate><guid>https://www.rforecology.com/post/how-to-make-a-quality-scatterplot-in-r/</guid><description>&lt;!-- This ensures that the code chunks have a scroll bar: -->
&lt;style>
pre code, pre, code {
&lt;!-- white-space: pre !important; -->
overflow-x: scroll !important;
word-break: keep-all !important;
word-wrap: initial !important;
}
&lt;/style>
&lt;p>Scatterplots are one of the most common types of data visualizations you will encounter as a biologist. They present the relationship between two continuous variables. We might take them for granted by their simplicity, but we shouldn&amp;rsquo;t assume the seeming intuition with which we can see and comprehend these figures. They are a powerful tool, but one that I believe merits a bit more attention. &lt;a href="https://www.newyorker.com/magazine/2021/06/21/when-graphs-are-a-matter-of-life-and-death?utm_medium=email&amp;utm_source=topic+optin&amp;utm_campaign=awareness&amp;utm_content=20210705+data+ai+nl&amp;mkt_tok=MTA3LUZNUy0wNzAAAAF-Fszu4ib3aHqZXqJnQUwI9VINARyUTs8vmVY6e63amgjIVyNFRzWvCNnR24405lDHuIenlJR7l9elIrAXu4NplaNjsacasvi-7SxVG45q223Jz6kz" target="_blank">Check out this really cool article from the New Yorker about &amp;lsquo;When graphs are a matter of life and death&amp;rsquo;&lt;/a> for more history on the subject.&lt;/p>
&lt;p>All through my grad school years and beyond, I&amp;rsquo;ve repeatedly come across scatterplots that almost defeat the purpose of helping us easily understand the relationship between two variables. Here&amp;rsquo;s a typical example of the type of plot I&amp;rsquo;ve seen one-too-many times:&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/post/how-to-make-a-quality-scatterplot-in-r/index_files/figure-html/unnamed-chunk-1-1.png" width="1152" />&lt;/p>
&lt;p>There are several issues here, but without elaborating, here are the same data after a few visual tweaks:
&lt;img src="https://www.rforecology.com/post/how-to-make-a-quality-scatterplot-in-r/index_files/figure-html/unnamed-chunk-2-1.png" width="672" />
Much more striking and easy to read, no?&lt;/p>
&lt;p>In other words, while the data may be accurate, the actual visual design of scatterplots is often overlooked and unattended. Unlike a statistical test, the goal of data visualizations is subjective—&lt;em>to help a viewer understand a particular relationship or story&lt;/em>. For that reason, it is important that we take a subjective, and dare I say aesthetic, approach towards ensuring scatterplots (and all other plot types, really) are visually appealing and easy to understand on a quick glance.&lt;/p>
&lt;p>I have a hunch that the main reason plots such as the first one above are so common is simply due to a lack of knowing how to easily customize plots in R. Unfortunately, even ggplot2—which is commended for the ease with which one can make good quality visualizations—is not so pretty right out of the box.&lt;/p>
&lt;p>Hence this blog post ;)&lt;/p>
&lt;p>Here is a simple tutorial on how to re-create the nice version of the plot above using the &amp;lsquo;base&amp;rsquo; R package. The key is just to include a few additional parameters and functions. In the future I may update this post with how to do this using ggplot2.&lt;/p>
&lt;p>First, let&amp;rsquo;s load the data. In this case we are using the built-in dataset on air quality measurements in New York from May through September in 1973:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Load the built-in data:&lt;/span>
&lt;span style="color:#268bd2">data&lt;/span>(airquality)
&lt;span style="color:#268bd2">help&lt;/span>(airquality)
&lt;span style="color:#268bd2">head&lt;/span>(airquality)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## Ozone Solar.R Wind Temp Month Day
## 1 41 190 7.4 67 5 1
## 2 36 118 8.0 72 5 2
## 3 12 149 12.6 74 5 3
## 4 18 313 11.5 62 5 4
## 5 NA NA 14.3 56 5 5
## 6 28 NA 14.9 66 5 6
&lt;/code>&lt;/pre>&lt;p>For this tutorial we are only interested in ozone concentration, which is measured in parts per million (ppm), and wind speed, which is measured in miles per hour (MPH). To get this info, I just ran the &amp;lsquo;help(airquality)&amp;rsquo; function to pull up a description of these data.&lt;/p>
&lt;p>Next, let&amp;rsquo;s start with the plot created using the &amp;lsquo;plot()&amp;rsquo; function right out of the box:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">plot&lt;/span>(Ozone &lt;span style="color:#719e07">~&lt;/span> Wind, data&lt;span style="color:#719e07">=&lt;/span>airquality)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/how-to-make-a-quality-scatterplot-in-r/index_files/figure-html/unnamed-chunk-4-1.png" width="672" />
Which ever variable you want on the Y-axis goes to the left of the tilde &amp;lsquo;~&amp;rsquo; and the X-axis goes on the right of the tilde. The neat thing about this notation is that you can just directly use the column names of the data you&amp;rsquo;d like to plot. In this case we set the &amp;lsquo;data&amp;rsquo; argument to our dataframe &amp;lsquo;airquality&amp;rsquo;, and &amp;lsquo;Ozone&amp;rsquo; and &amp;lsquo;Wind&amp;rsquo; were the column names taken right from that dataframe.&lt;/p>
&lt;p>Next, remove all the axes and tick marks from the plot so that we can start with a clean slate:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">plot&lt;/span>(Ozone &lt;span style="color:#719e07">~&lt;/span> Wind, data&lt;span style="color:#719e07">=&lt;/span>airquality, xaxt&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;n&amp;#34;&lt;/span>, yaxt&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;n&amp;#34;&lt;/span>, ylab&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>, xlab&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/how-to-make-a-quality-scatterplot-in-r/index_files/figure-html/unnamed-chunk-5-1.png" width="672" />
Then we&amp;rsquo;ll add back new axes that are fully customizable using the &amp;lsquo;axis()&amp;rsquo; function:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">plot&lt;/span>(Ozone &lt;span style="color:#719e07">~&lt;/span> Wind, data&lt;span style="color:#719e07">=&lt;/span>airquality, xaxt&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;n&amp;#34;&lt;/span>, yaxt&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;n&amp;#34;&lt;/span>, ylab&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>, xlab&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>)
&lt;span style="color:#586e75"># add the wind speed axis:&lt;/span>
&lt;span style="color:#268bd2">axis&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1&lt;/span>, at&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">seq&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#268bd2">max&lt;/span>(airquality&lt;span style="color:#719e07">$&lt;/span>Wind, na.rm&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">T&lt;/span>), &lt;span style="color:#2aa198">2&lt;/span>), padj&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">-0.8&lt;/span>)
&lt;span style="color:#586e75"># add the Ozone axis:&lt;/span>
&lt;span style="color:#268bd2">axis&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>, at&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">seq&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#268bd2">max&lt;/span>(airquality&lt;span style="color:#719e07">$&lt;/span>Ozone, na.rm&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">T&lt;/span>), &lt;span style="color:#2aa198">20&lt;/span>), hadj&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">0.8&lt;/span>, las&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/how-to-make-a-quality-scatterplot-in-r/index_files/figure-html/unnamed-chunk-6-1.png" width="672" />
The &amp;lsquo;side&amp;rsquo; argument indicates what side of the graph we are adding the axis on. Sides 1, 2, 3, and 4 are the bottom, left, top, and right, respectively. Then, &amp;lsquo;at&amp;rsquo; is where we tell the function where to put the axis ticks. For example, here is what we use for that argument:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># This finds the maximum value for wind:&lt;/span>
&lt;span style="color:#268bd2">max&lt;/span>(airquality&lt;span style="color:#719e07">$&lt;/span>Wind, na.rm&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">T&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 20.7
&lt;/code>&lt;/pre>&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Then use that in the &amp;#39;seq()&amp;#39; function to create the sequence of places for the tickmarks:&lt;/span>
&lt;span style="color:#268bd2">seq&lt;/span>(from &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">0&lt;/span>, to &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#268bd2">max&lt;/span>(airquality&lt;span style="color:#719e07">$&lt;/span>Wind, na.rm&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">T&lt;/span>), by &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">2&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## [1] 0 2 4 6 8 10 12 14 16 18 20
&lt;/code>&lt;/pre>&lt;p>The &amp;lsquo;padj&amp;rsquo; and &amp;lsquo;hadj&amp;rsquo; (perpendicular adjust and horizontal adjust) arguments are used to nudge the axis tickmark label text so that it lines up more neatly. Play around with those values to see what you get. Finally, the &amp;lsquo;las&amp;rsquo; argument when set to 2, turns the y axis tick marks horizontally so that they are more easily readable and all fit on the axis neatly.&lt;/p>
&lt;p>Next, add in the axis name labels using the &amp;lsquo;mtext&amp;rsquo; function. It&amp;rsquo;s always important to add units to these labels, which I did. The &amp;lsquo;line&amp;rsquo; argument is how far from the edge of the plot you want the label to appear. &amp;lsquo;cex&amp;rsquo; affects the size of the text, and finally, &amp;lsquo;font&amp;rsquo; is used to make the text bold. You can set &amp;lsquo;font&amp;rsquo; to 1, 2, 3, or 4, for normal, bold, italic, or italic + bold respectively. Play around with all those parameters to see how it changes the figure.&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">plot&lt;/span>(Ozone &lt;span style="color:#719e07">~&lt;/span> Wind, data&lt;span style="color:#719e07">=&lt;/span>airquality, xaxt&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;n&amp;#34;&lt;/span>, yaxt&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;n&amp;#34;&lt;/span>, ylab&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>, xlab&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>)
&lt;span style="color:#586e75"># add the wind speed axis:&lt;/span>
&lt;span style="color:#268bd2">axis&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1&lt;/span>, at&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">seq&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#268bd2">max&lt;/span>(airquality&lt;span style="color:#719e07">$&lt;/span>Wind, na.rm&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">T&lt;/span>), &lt;span style="color:#2aa198">2&lt;/span>), padj&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">-0.8&lt;/span>)
&lt;span style="color:#586e75"># add the Ozone axis:&lt;/span>
&lt;span style="color:#268bd2">axis&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>, at&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">seq&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#268bd2">max&lt;/span>(airquality&lt;span style="color:#719e07">$&lt;/span>Ozone, na.rm&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">T&lt;/span>), &lt;span style="color:#2aa198">20&lt;/span>), hadj&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">0.8&lt;/span>, las&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;span style="color:#586e75"># add in the labels for each axis:&lt;/span>
&lt;span style="color:#268bd2">mtext&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Wind Speed (mph)&amp;#34;&lt;/span>, line&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2.8&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.5&lt;/span>, font&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;span style="color:#268bd2">mtext&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Ozone Concentration (ppb)&amp;#34;&lt;/span>, line&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2.8&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.4&lt;/span>, font&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/how-to-make-a-quality-scatterplot-in-r/index_files/figure-html/unnamed-chunk-8-1.png" width="672" />
Now that we have all the elements there, let&amp;rsquo;s adjust the points a bit. I&amp;rsquo;m really not a fan of the circle outline points. Something about it just doesn&amp;rsquo;t give the emphasis I want to see in the figure (feel free to disagree!). Instead, I prefer to fill in the points using the &amp;lsquo;pch = 16&amp;rsquo; argument, and then make them a bit bigger with the &amp;lsquo;cex&amp;rsquo; argument (both in the &amp;lsquo;plot()&amp;rsquo; function):&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">plot&lt;/span>(Ozone &lt;span style="color:#719e07">~&lt;/span> Wind, data&lt;span style="color:#719e07">=&lt;/span>airquality, xaxt&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;n&amp;#34;&lt;/span>, yaxt&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;n&amp;#34;&lt;/span>, ylab&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>, xlab&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>, pch&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">16&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.5&lt;/span>)
&lt;span style="color:#586e75"># add the wind speed axis:&lt;/span>
&lt;span style="color:#268bd2">axis&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1&lt;/span>, at&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">seq&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#268bd2">max&lt;/span>(airquality&lt;span style="color:#719e07">$&lt;/span>Wind, na.rm&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">T&lt;/span>), &lt;span style="color:#2aa198">2&lt;/span>), padj&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">-0.8&lt;/span>)
&lt;span style="color:#586e75"># add the Ozone axis:&lt;/span>
&lt;span style="color:#268bd2">axis&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>, at&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">seq&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#268bd2">max&lt;/span>(airquality&lt;span style="color:#719e07">$&lt;/span>Ozone, na.rm&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">T&lt;/span>), &lt;span style="color:#2aa198">20&lt;/span>), hadj&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">0.8&lt;/span>, las&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;span style="color:#586e75"># add in the labels for each axis:&lt;/span>
&lt;span style="color:#268bd2">mtext&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Wind Speed (mph)&amp;#34;&lt;/span>, line&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2.8&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.5&lt;/span>, font&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;span style="color:#268bd2">mtext&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Ozone Concentration (ppb)&amp;#34;&lt;/span>, line&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2.8&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.4&lt;/span>, font&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/how-to-make-a-quality-scatterplot-in-r/index_files/figure-html/unnamed-chunk-9-1.png" width="672" />
I think that looks a lot better. The only problem here is that a lot of points overlap so you loose the ability to see those clusters. To fix that we can make the point color transparent. This is actually very easy to do using the &amp;lsquo;ggplot2&amp;rsquo; package, but we can also do it with the &amp;lsquo;base&amp;rsquo; package—it just takes a bit more code. I made a function to make creating transparent colors a bit easier:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75">### transparent colors function&lt;/span>
t_col &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">function&lt;/span>(color, opacity &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">0.5&lt;/span>) {
rgb.val &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">col2rgb&lt;/span>(color)
t.col &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">rgb&lt;/span>(rgb.val[1], rgb.val[2], rgb.val[3], max &lt;span style="color:#719e07">=&lt;/span> &lt;span style="color:#2aa198">255&lt;/span>, alpha &lt;span style="color:#719e07">=&lt;/span> (opacity)&lt;span style="color:#719e07">*&lt;/span>&lt;span style="color:#2aa198">255&lt;/span>)
&lt;span style="color:#268bd2">invisible&lt;/span>(t.col)
}
&lt;/code>&lt;/pre>&lt;/div>&lt;p>This function essentially takes in two arguments: &amp;lsquo;color&amp;rsquo; which is the color you want to make transparent, and then &amp;lsquo;opacity&amp;rsquo; which goes from 0 to 1, with 0 being totally transparent, and 1 being no transparency. Adding this function to our plot looks like this:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">plot&lt;/span>(Ozone &lt;span style="color:#719e07">~&lt;/span> Wind, data&lt;span style="color:#719e07">=&lt;/span>airquality, xaxt&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;n&amp;#34;&lt;/span>, yaxt&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;n&amp;#34;&lt;/span>, ylab&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>, xlab&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>, pch&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">16&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.5&lt;/span>,
col&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">t_col&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;black&amp;#34;&lt;/span>,&lt;span style="color:#2aa198">0.6&lt;/span>))
&lt;span style="color:#586e75"># add the wind speed axis:&lt;/span>
&lt;span style="color:#268bd2">axis&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1&lt;/span>, &lt;span style="color:#268bd2">seq&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#268bd2">max&lt;/span>(airquality&lt;span style="color:#719e07">$&lt;/span>Wind, na.rm&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">T&lt;/span>), &lt;span style="color:#2aa198">2&lt;/span>), padj&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">-0.8&lt;/span>)
&lt;span style="color:#586e75"># add the Ozone axis:&lt;/span>
&lt;span style="color:#268bd2">axis&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>, at&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">seq&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#268bd2">max&lt;/span>(airquality&lt;span style="color:#719e07">$&lt;/span>Ozone, na.rm&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">T&lt;/span>), &lt;span style="color:#2aa198">20&lt;/span>), hadj&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">0.8&lt;/span>, las&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;span style="color:#586e75"># add in the labels for each axis:&lt;/span>
&lt;span style="color:#268bd2">mtext&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Wind Speed (mph)&amp;#34;&lt;/span>, line&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2.8&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.5&lt;/span>, font&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;span style="color:#268bd2">mtext&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Ozone Concentration (ppb)&amp;#34;&lt;/span>, line&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2.8&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.4&lt;/span>, font&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/how-to-make-a-quality-scatterplot-in-r/index_files/figure-html/unnamed-chunk-11-1.png" width="672" />
Almost done! I like to add a bit of white space around the edges of the points so that they don&amp;rsquo;t experience any &amp;ldquo;edge effects&amp;rdquo; and allow you to figuratively &amp;ldquo;stand back&amp;rdquo; when looking at all of the data. There&amp;rsquo;s also no reason windspeed shouldn&amp;rsquo;t start at zero since we&amp;rsquo;re close to that anyway. So to add that spacing and extend the axes, we just change the axis limits using the &amp;lsquo;xlim&amp;rsquo; and &amp;lsquo;ylim&amp;rsquo; arguments in the &amp;lsquo;plot()&amp;rsquo; function. They each take a vector of two values that indicate the minimum and maximum extent of each axis:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">plot&lt;/span>(Ozone &lt;span style="color:#719e07">~&lt;/span> Wind, data&lt;span style="color:#719e07">=&lt;/span>airquality, xaxt&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;n&amp;#34;&lt;/span>, yaxt&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;n&amp;#34;&lt;/span>, ylab&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>, xlab&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>, pch&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">16&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.5&lt;/span>,
col&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">t_col&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;black&amp;#34;&lt;/span>,&lt;span style="color:#2aa198">0.6&lt;/span>), ylim&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>,&lt;span style="color:#2aa198">185&lt;/span>), xlim&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>,&lt;span style="color:#2aa198">22&lt;/span>))
&lt;span style="color:#586e75"># add the wind speed axis:&lt;/span>
&lt;span style="color:#268bd2">axis&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1&lt;/span>, &lt;span style="color:#268bd2">seq&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#268bd2">max&lt;/span>(airquality&lt;span style="color:#719e07">$&lt;/span>Wind, na.rm&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">T&lt;/span>), &lt;span style="color:#2aa198">2&lt;/span>), padj&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">-0.8&lt;/span>)
&lt;span style="color:#586e75"># add the Ozone axis:&lt;/span>
&lt;span style="color:#268bd2">axis&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>, at&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">seq&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#268bd2">max&lt;/span>(airquality&lt;span style="color:#719e07">$&lt;/span>Ozone, na.rm&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">T&lt;/span>), &lt;span style="color:#2aa198">20&lt;/span>), hadj&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">0.8&lt;/span>, las&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;span style="color:#586e75"># add in the labels for each axis:&lt;/span>
&lt;span style="color:#268bd2">mtext&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Wind Speed (mph)&amp;#34;&lt;/span>, line&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2.8&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.5&lt;/span>, font&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;span style="color:#268bd2">mtext&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Ozone Concentration (ppb)&amp;#34;&lt;/span>, line&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2.8&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.4&lt;/span>, font&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/how-to-make-a-quality-scatterplot-in-r/index_files/figure-html/unnamed-chunk-12-1.png" width="672" />&lt;/p>
&lt;p>We&amp;rsquo;ll end by using the &amp;lsquo;par()&amp;rsquo; function to set the margins of the plot. I don&amp;rsquo;t like how close the Y axis label is to the edge of the figure. The &amp;lsquo;mar&amp;rsquo; argument is to set the margins around the edge of the figure. I&amp;rsquo;m not sure what units those are in, but play around with the numbers until you get something that looks good. The four values in the vector represent the four sides in the same order as the &amp;lsquo;side&amp;rsquo; argument used for the axes: bottom, left, top, and right:&lt;/p>
&lt;p>Here is the plot again with a background color so that you can see what I mean:
&lt;img src="https://www.rforecology.com/post/how-to-make-a-quality-scatterplot-in-r/index_files/figure-html/unnamed-chunk-13-1.png" width="672" />&lt;/p>
&lt;p>And after we added the &amp;lsquo;par(mar=c(5,5,2,2))':
&lt;img src="https://www.rforecology.com/post/how-to-make-a-quality-scatterplot-in-r/index_files/figure-html/unnamed-chunk-14-1.png" width="672" />
That looks good! So here is the final code:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">par&lt;/span>(mar&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">5&lt;/span>,&lt;span style="color:#2aa198">5&lt;/span>,&lt;span style="color:#2aa198">2&lt;/span>,&lt;span style="color:#2aa198">2&lt;/span>))
&lt;span style="color:#268bd2">plot&lt;/span>(Ozone &lt;span style="color:#719e07">~&lt;/span> Wind, data&lt;span style="color:#719e07">=&lt;/span>airquality, xaxt&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;n&amp;#34;&lt;/span>, yaxt&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;n&amp;#34;&lt;/span>, ylab&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>, xlab&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>, pch&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">16&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.5&lt;/span>,
ylim&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>,&lt;span style="color:#2aa198">185&lt;/span>), xlim&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>,&lt;span style="color:#2aa198">22&lt;/span>), col&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">t_col&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;black&amp;#34;&lt;/span>,&lt;span style="color:#2aa198">0.6&lt;/span>))
&lt;span style="color:#268bd2">mtext&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Wind Speed (mph)&amp;#34;&lt;/span>, line&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2.8&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.5&lt;/span>, font&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;span style="color:#268bd2">axis&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1&lt;/span>, &lt;span style="color:#268bd2">seq&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#268bd2">max&lt;/span>(airquality&lt;span style="color:#719e07">$&lt;/span>Wind)&lt;span style="color:#2aa198">+2&lt;/span>, &lt;span style="color:#2aa198">2&lt;/span>), padj&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">-0.8&lt;/span>)
&lt;span style="color:#268bd2">mtext&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Ozone Concentration (ppb)&amp;#34;&lt;/span>, line&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2.8&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.4&lt;/span>, font&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;span style="color:#268bd2">axis&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>, at&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">seq&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#2aa198">185&lt;/span>, &lt;span style="color:#2aa198">20&lt;/span>), hadj&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">0.8&lt;/span>, las&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Finally (and maybe most importantly), when you save your figure by clicking &amp;lsquo;Export&amp;rsquo; above the &amp;lsquo;Plots&amp;rsquo; pane in R Studio, you&amp;rsquo;ll have the option to resize the figure dimensions and see a preview of how it looks with different dimensions. Don&amp;rsquo;t neglect the important step of ensuring the figure dimensions are set to a size that considers the proportion of all the elements in the figure. Just play around with the sizing and you&amp;rsquo;ll see what I mean. &lt;strong>This is what you are going for:&lt;/strong>&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/post/how-to-make-a-quality-scatterplot-in-r/index_files/figure-html/unnamed-chunk-16-1.png" width="672" />
&lt;em>&lt;strong>Not this:&lt;/strong>&lt;/em>
&lt;img src="https://www.rforecology.com/post/how-to-make-a-quality-scatterplot-in-r/index_files/figure-html/unnamed-chunk-17-1.png" width="1152" />&lt;/p>
&lt;p>Alternatively, if you want the image size to also remain in the code, create a &amp;lsquo;quartz()&amp;rsquo; window (if using a mac) or windows() window (if using a PC). Set &amp;lsquo;height&amp;rsquo; and &amp;lsquo;width&amp;rsquo; in those functions to the desired size (I believe the units are inches) and run that function before the code that creates the plot. This will open up an external graphics window that is sized to your specifications and you can then go to the file menu at the top of your screen to save the figure as a PDF.&lt;/p>
&lt;p>You can also save directly to a graphic window file. Here is the final code for how to do this:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">pdf&lt;/span>(file&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;my_scatterplot.pdf&amp;#34;&lt;/span>,width&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">7&lt;/span>,height&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">4.5&lt;/span>)
&lt;span style="color:#268bd2">par&lt;/span>(mar&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">5&lt;/span>,&lt;span style="color:#2aa198">5&lt;/span>,&lt;span style="color:#2aa198">2&lt;/span>,&lt;span style="color:#2aa198">2&lt;/span>))
&lt;span style="color:#268bd2">plot&lt;/span>(Ozone &lt;span style="color:#719e07">~&lt;/span> Wind, data&lt;span style="color:#719e07">=&lt;/span>airquality, xaxt&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;n&amp;#34;&lt;/span>, yaxt&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;n&amp;#34;&lt;/span>, ylab&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>, xlab&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">&amp;#34;&amp;#34;&lt;/span>, pch&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">16&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.5&lt;/span>,
ylim&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>,&lt;span style="color:#2aa198">185&lt;/span>), xlim&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>,&lt;span style="color:#2aa198">22&lt;/span>), col&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">t_col&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;black&amp;#34;&lt;/span>,&lt;span style="color:#2aa198">0.6&lt;/span>))
&lt;span style="color:#268bd2">mtext&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Wind Speed (mph)&amp;#34;&lt;/span>, line&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2.8&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.5&lt;/span>, font&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;span style="color:#268bd2">axis&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1&lt;/span>, &lt;span style="color:#268bd2">seq&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#268bd2">max&lt;/span>(airquality&lt;span style="color:#719e07">$&lt;/span>Wind)&lt;span style="color:#2aa198">+2&lt;/span>, &lt;span style="color:#2aa198">2&lt;/span>), padj&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">-0.8&lt;/span>)
&lt;span style="color:#268bd2">mtext&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;Ozone Concentration (ppb)&amp;#34;&lt;/span>, line&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2.8&lt;/span>, cex&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">1.4&lt;/span>, font&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;span style="color:#268bd2">axis&lt;/span>(side&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>, at&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#268bd2">seq&lt;/span>(&lt;span style="color:#2aa198">0&lt;/span>, &lt;span style="color:#2aa198">185&lt;/span>, &lt;span style="color:#2aa198">20&lt;/span>), hadj&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">0.8&lt;/span>, las&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">2&lt;/span>)
&lt;span style="color:#268bd2">dev.off&lt;/span>()
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Simply use the &amp;lsquo;pdf()&amp;rsquo; function to set the name of the file and directory where it will be saved and specify the height and width in inches. You can play around with those measurements until you find something that works. Then run the code that creates the plot. And finally, run &amp;lsquo;dev.off()&amp;rsquo; to close that graphic device.&lt;/p>
&lt;p>I recommend always saving your figures as PDFs to retain maximum quality. &lt;a href="https://blog.revolutionanalytics.com/2009/01/10-tips-for-making-your-r-graphics-look-their-best.html" target="_blank">Check out this other excellent blog post by David Smith with more details about how and why to save your figures in particular formats.&lt;/a>&lt;/p>
&lt;p>Well done! That&amp;rsquo;s it for now. Do you think this is easier to do with ggplot? I&amp;rsquo;ll follow up with an update or post on that as well.&lt;/p>
&lt;p>&lt;em>If you liked this article, let me know what you might want to see next in the comments down below.&lt;/em>&lt;/p>
&lt;br>
&lt;hr>
&lt;center>If you liked this post and want to learn more, then check out my online course on an introduction to data visualization with R for ecology:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://coaching.rforecology.com/intro-to-dataviz-in-r-for-ecologists-enroll?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start Visualizing Data with R Now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>The myth of the R learning curve</title><link>https://www.rforecology.com/post/myth-of-the-r-learning-curve/</link><pubDate>Mon, 26 Jul 2021 00:00:00 +0000</pubDate><guid>https://www.rforecology.com/post/myth-of-the-r-learning-curve/</guid><description>&lt;p>I think that the &amp;ldquo;difficult&amp;rdquo; R learning curve is a myth.&lt;/p>
&lt;p>That&amp;rsquo;s because what people call the &amp;ldquo;R learning curve&amp;rdquo; is actually the combination of several disparate skill sets that are often taught as one conglomerated curriculum.&lt;/p>
&lt;p>Let me explain.&lt;/p>
&lt;p>Most university courses in biology that teach R, don&amp;rsquo;t just teach R. The goal of those classes is to help students learn how to plot and analyze their own data (and eventually use those skills for actual research).&lt;/p>
&lt;p>So, is the course teaching data analysis and statistics? or R? Usually its goal is to teach all of those things. That&amp;rsquo;s where the problem is.&lt;/p>
&lt;p>Having worked with many undergraduate and graduate students on learning R one-on-one, it&amp;rsquo;s become clear that there is a particularly deep chasm between what it means to learn R and what it means to learn statistics. R is a programming language, but data analysis and statistics per se are mostly math. R is just a tool for doing statistics. For example, statistics and data analyses can be conducted using tools ranging from a calculator, to Microsoft Excel. While R remains one of the best tools, there is no intrinsic link that implies R must be taught simultaneously with statistics. In fact, that&amp;rsquo;s my point.&lt;/p>
&lt;p>One of the main reasons R appears to have a difficult learning curve is simply because it is often confounded with learning statistics at the same time. One of my goals with the courses that I teach is to separate statistics and R. If I&amp;rsquo;m going to teach a course on R, it is just about R. Once you have a solid handle on that, then we can move on to using R for learning statistics. But you need to know how to use the right tools first. That&amp;rsquo;s why I created &lt;a href="https://www.rforecology.com/" target="_blank">my course on the basics of R for ecologists.&lt;/a> It doesn&amp;rsquo;t cover any stats or data analysis, but that&amp;rsquo;s my intention.&lt;/p>
&lt;p>I want to outline one more reason why R appears to have a difficult learning curve.&lt;/p>
&lt;p>Many of the mainstream R courses (such as the university courses I mentioned above) tend to mistake &amp;ldquo;learning R&amp;rdquo; with &amp;ldquo;learning &lt;em>everything&lt;/em> in R.&amp;rdquo; The professors that teach these courses usually have many years of experience and have thus accumulated a very large tool shed of packages and functions and operations for R&lt;a href="https://www.popularmechanics.com/home/tools/g25617366/weird-tools/" target="_blank"> (take a look at this Popular Mechanics post about some of the weirdest actual hardware tools)&lt;/a>. This then becomes the standard for what should be taught and the course is now about cramming 10 years of experience with R into one semester. Not only is this too much to teach in such a short time, but it also takes the focus away from learning what is actually most important for simply plotting and analyzing data.&lt;/p>
&lt;p>To be fair, I must say there are a lot of great professors out there that do recognize this issue and carefully focus on the most important functions and operations when teaching R, but those seem to be uncommon.&lt;/p>
&lt;p>In a recent post, I shared a cheat sheet on the most common but important functions when using R for ecology. &lt;a href="https://www.rforecology.com/post/the-essential-functions-of-r-cheatsheet/" target="_blank">(Click here to see the post and download the cheat sheet).&lt;/a> My goal there was to share the most common functions that also provide the most bang for the buck. &lt;em>In other words, the majority of all the code you will ever write in R comes down to just a handful of functions.&lt;/em>&lt;/p>
&lt;p>So why don&amp;rsquo;t most R courses start by focusing on those few functions first? Maybe for the same reasons that traditional language classes focus too much on grammar and syntax than just speaking? &lt;a href="https://blog.weareteacherfinder.com/blog/traditional-language-learning-methods-fail/" target="_blank">(Check out the &lt;em>Natural Order Hypothesis&lt;/em> about learning new languages.).&lt;/a>&lt;/p>
&lt;p>To wrap this all up and summarize my point, I think that there are two primary reasons that there appears to be a difficult R learning curve and why so many students do end up having a truly difficult time with R.&lt;/p>
&lt;blockquote>
&lt;p>&lt;strong>First&lt;/strong>, teaching R is often confounded with teaching statistics. Pick one (preferably R first), and then move on to the other.&lt;/p>
&lt;/blockquote>
&lt;blockquote>
&lt;p>&lt;strong>Second&lt;/strong>, start by only teaching the most essential and important functions first. Don&amp;rsquo;t overwhelm students with all the functions they might ever need to know. And if you know two ways to do the same thing? Just pick one.&lt;/p>
&lt;/blockquote>
&lt;p>So, what do you think about this topic? What are your Stork Beak Pliers in R?&lt;/p>
&lt;!-- ![stork beak pliers]() -->
&lt;figure id="figure-stork-beak-pliers-an-example-of-an-uncommon-tool-that-has-a-specific-purpose-but-not-necessary-for-beginners-to-learn">
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="https://hips.hearstapps.com/vader-prod.s3.amazonaws.com/1545152869-2-1545152857.jpg?crop=1xw:1xh;center,top&amp;amp;resize=768:*" alt="Stork Beak Pliers: an example of an uncommon tool that has a specific purpose but not necessary for beginners to learn." loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;figcaption>
Stork Beak Pliers: an example of an uncommon tool that has a specific purpose but not necessary for beginners to learn.
&lt;/figcaption>&lt;/figure>
&lt;br>
&lt;hr>
&lt;center>If you liked this post and want to learn more, then check out my online course on the complete basics of R for ecology:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://coaching.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;a href="https://www.r-bloggers.com/" target="_blank">&lt;strong>R-bloggers&lt;/strong>&lt;/a> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>The essential functions of R cheatsheet</title><link>https://www.rforecology.com/post/the-essential-functions-of-r-cheatsheet/</link><pubDate>Mon, 19 Jul 2021 00:00:00 +0000</pubDate><guid>https://www.rforecology.com/post/the-essential-functions-of-r-cheatsheet/</guid><description>&lt;p>👇 &lt;strong>&lt;em>&lt;em>Download link is at the bottom of the post&lt;/em>&lt;/em>&lt;/strong> 👇&lt;/p>
&lt;p>Something that I quickly came to learn as an ecologist using R is that out of the hundreds (possibly thousands?) of functions available in R, only a handful were those that I used frequently throughout my code.&lt;/p>
&lt;p>I&amp;rsquo;m also learning to speak Spanish right now, and I&amp;rsquo;ve found that for learning a new language it is a good idea to start by focusing on the most common words, since only those few words account for a significant proportion of everything you&amp;rsquo;ll ever need to say.&lt;/p>
&lt;p>Anyone familiar with &lt;a href="https://tim.blog/" target="_blank" rel="noopener">Tim Ferriss&lt;/a> probably knows about the 80-20 rule (&lt;a href="https://en.wikipedia.org/wiki/Pareto_principle" target="_blank" rel="noopener">Pareto&amp;rsquo;s principle&lt;/a>) that he&amp;rsquo;s made popular throughout his books and podcasts. The rule simply states that 80% of results come from 20% of the work.&lt;/p>
&lt;p>To apply that to learning a language, learning only a small proportion of words (20%) will allow you to say a large proportion (80%) what you&amp;rsquo;d ever need to say. Now, these percentages might not be exactly the same for every application, but hopefully you get the point.&lt;/p>
&lt;p>Now back to R! So, that&amp;rsquo;s what I did with all the functions I use in R. I found the &amp;ldquo;20%&amp;rdquo; of functions that I ever used in ecology that gave me the most results. In other words, if you learn these functions (51 functions to be exact), you will be well on your way to do almost anything you need to do with your data. And if there&amp;rsquo;s something missing, that will be easy to learn when you need it.&lt;/p>
&lt;p>So here is my version 1.0 of a cheat sheet on the essential functions of R (for ecology). Please enjoy and share! Notice a typo? Let me know in the comments below.&lt;/p>
&lt;div class="_form_25">&lt;/div>&lt;script src="https://rforecology.activehosted.com/f/embed.php?id=25" type="text/javascript" charset="utf-8">&lt;/script>
&lt;!-- This makes the link open in a new window: -->
&lt;p>&lt;b>&lt;a href="https://www.rforecology.com/uploads/The_essential_R_Cheatsheet_v1_0.pdf" target="_blank">Click here to download the Essentials of R cheatsheet v1.0 (PDF)&lt;/a>&lt;/b>&lt;/p>
&lt;!-- Now also available as .JPG files for each side of the cheatsheet: &lt;br> -->
&lt;!-- &lt;b>&lt;a href="https://www.rforecology.com/uploads/The_essential_R_Cheatsheet_v1_0 (SIDE 1).jpg" target="_blank">Download Side 1 &lt;/a>&lt;/b> &lt;br> -->
&lt;!-- &lt;b>&lt;a href="https://www.rforecology.com/uploads/The_essential_R_Cheatsheet_v1_0 (SIDE 2).jpg" target="_blank">Download Side 2 &lt;/a>&lt;/b> -->
&lt;!-- **[Click here to download the Essentials of R cheatsheet v1.0](/uploads/The_essential_R_Cheatsheet_v1_0.pdf)* -->
&lt;br>
&lt;hr>
&lt;center>If you liked this post and want to learn more, then check out my online course on the complete basics of R for ecology:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://coaching.rforecology.com/the-basics-of-r-for-ecologists-enroll?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>How to do a simple linear regression in R</title><link>https://www.rforecology.com/post/how-to-do-simple-linear-regression-in-r/</link><pubDate>Fri, 09 Jul 2021 00:00:00 +0000</pubDate><guid>https://www.rforecology.com/post/how-to-do-simple-linear-regression-in-r/</guid><description>&lt;p>In this tutorial I show you how to do a simple linear regression in R that models the relationship between two numeric variables.&lt;/p>
&lt;p>Check out this tutorial on YouTube if you&amp;rsquo;d prefer to follow along while I do the coding:&lt;/p>
&lt;p>&lt;a href="https://youtu.be/iOItuj6q6lg" target="_blank" rel="noopener">&lt;img src="https://www.rforecology.com/youtube_thumb.png" alt="Video thumbnail of tutorial on linear regression">&lt;/a>&lt;/p>
&lt;!--&lt;img src="youtube_thumb.png" alt="Video thumbnail of tutorial on linear regression">-->
&lt;p>The first step is to load some data. We&amp;rsquo;ll use the
&amp;lsquo;trees&amp;rsquo; dataset that comes built in with R:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Load the data:&lt;/span>
&lt;span style="color:#268bd2">data&lt;/span>(trees)
&lt;span style="color:#586e75"># Show the top few entries:&lt;/span>
&lt;span style="color:#268bd2">head&lt;/span>(trees)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## Girth Height Volume
## 1 8.3 70 10.3
## 2 8.6 65 10.3
## 3 8.8 63 10.2
## 4 10.5 72 16.4
## 5 10.7 81 18.8
## 6 10.8 83 19.7
&lt;/code>&lt;/pre>&lt;p>These data include measurements of the diameter, height and volume of 31 black cherry trees. Note that the &amp;lsquo;Girth&amp;rsquo; is actually the diameter at breast height (DBH) in inches, &amp;lsquo;Height&amp;rsquo; is height in feet, and &amp;lsquo;Volume&amp;rsquo; is volume in cubic feet. So let&amp;rsquo;s just rename those variable names for clarity:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># rename columns&lt;/span>
&lt;span style="color:#268bd2">names&lt;/span>(trees) &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">c&lt;/span>(&lt;span style="color:#2aa198">&amp;#34;DBH_in&amp;#34;&lt;/span>,&lt;span style="color:#2aa198">&amp;#34;height_ft&amp;#34;&lt;/span>, &lt;span style="color:#2aa198">&amp;#34;volume_ft3&amp;#34;&lt;/span>)
&lt;span style="color:#268bd2">head&lt;/span>(trees)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## DBH_in height_ft volume_ft3
## 1 8.3 70 10.3
## 2 8.6 65 10.3
## 3 8.8 63 10.2
## 4 10.5 72 16.4
## 5 10.7 81 18.8
## 6 10.8 83 19.7
&lt;/code>&lt;/pre>&lt;p>There we go.&lt;/p>
&lt;p>Now, for the basic linear regression, let&amp;rsquo;s model how the tree diameters change as they grow taller.&lt;/p>
&lt;p>First, let&amp;rsquo;s start by writing out what it is we actually want to model. We want to know how DBH varies as a function of tree height, but we can also write that out as: DBH_in ~ height_ft, the tilde (~) being read as &amp;ldquo;is a function of.&amp;rdquo;&lt;/p>
&lt;p>You can also think of this as the Y variable (the dependent or response variable) is a function of the X variable. How does Y depend on or respond to X?&lt;/p>
&lt;div class="alert alert-note">
&lt;div>
It&amp;rsquo;s important to note that we are not drawing any conclusions about the causal relationship between DBH and tree height, the linear regression analysis simply allows us to test the correlation or association of these two variables. This is very important to understand.
&lt;/div>
&lt;/div>
&lt;p>Now let&amp;rsquo;s visualize the potential association of these variables by plotting our model. The neat thing is that we can write out the plotting function using our &amp;ldquo;is a function of&amp;rdquo; notation:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">plot&lt;/span>(DBH_in &lt;span style="color:#719e07">~&lt;/span> height_ft, data &lt;span style="color:#719e07">=&lt;/span> trees, pch&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">16&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/how-to-do-simple-linear-regression-in-r/index_files/figure-html/unnamed-chunk-3-1.png" width="672" />&lt;/p>
&lt;p>I&amp;rsquo;m not a fan of the open circle points, so I added in the argument &amp;lsquo;pch = 16&amp;rsquo; to that plotting function to fill in the circles.&lt;/p>
&lt;p>So there appears to be a trend of increasing DBH with increasing height, but is that trend statistically significant?&lt;/p>
&lt;p>To test that, we will use the function lm(), which stands for linear model. The syntax is actually almost exactly the same as our plot! The only difference is that we will save the output of the model to its own object called &amp;lsquo;mod&amp;rsquo;:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Run the linear model and save it as &amp;#39;mod&amp;#39;&lt;/span>
mod &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">lm&lt;/span>(DBH_in &lt;span style="color:#719e07">~&lt;/span> height_ft, data &lt;span style="color:#719e07">=&lt;/span> trees)
&lt;span style="color:#586e75"># let&amp;#39;s view the output:&lt;/span>
mod
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>##
## Call:
## lm(formula = DBH_in ~ height_ft, data = trees)
##
## Coefficients:
## (Intercept) height_ft
## -6.1884 0.2557
&lt;/code>&lt;/pre>&lt;p>If we look at &amp;lsquo;mod&amp;rsquo; we don&amp;rsquo;t get much to work with, just the coefficient estimates for the intercept and slope. But we can run the &amp;lsquo;summary&amp;rsquo; function with &amp;lsquo;mod&amp;rsquo; to get more interesting results:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#268bd2">summary&lt;/span>(mod)
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>##
## Call:
## lm(formula = DBH_in ~ height_ft, data = trees)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.2386 -1.9205 -0.0714 2.7450 4.5384
##
## Coefficients:
## Estimate Std. Error t value Pr(&amp;gt;|t|)
## (Intercept) -6.18839 5.96020 -1.038 0.30772
## height_ft 0.25575 0.07816 3.272 0.00276 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.728 on 29 degrees of freedom
## Multiple R-squared: 0.2697, Adjusted R-squared: 0.2445
## F-statistic: 10.71 on 1 and 29 DF, p-value: 0.002758
&lt;/code>&lt;/pre>&lt;p>Beneath &amp;lsquo;Call&amp;rsquo; and where it shows us what our model looks like, we can see the distribution of the residuals or unexplained variance in our model: the min and max, the 1st and 3rd quartiles, and the median.&lt;/p>
&lt;p>But below that we have a table that gets a bit more interesting&amp;hellip;&lt;/p>
&lt;p>Remember that two coefficients get estimated from a basic linear model: The intercept and the slope. To model a line, we use the equation &lt;strong>&lt;em>Y = a + bX&lt;/em>&lt;/strong>, and the goal of the regression analysis is to estimate the &lt;strong>&lt;em>a&lt;/em>&lt;/strong> and the &lt;strong>&lt;em>b&lt;/em>&lt;/strong>.&lt;/p>
&lt;p>In that first column we have that estimate for each coefficient. Then we have the standard error of those estimates, then the test statistic, and finally, the p-value of each coefficient, which tests whether the intercept or slope values are actually zero. In our case, the p-value for the slope (height_ft) coefficient is less than 0.05, allowing you to say that the association of DBH_in and height_ft is statistically significantly.&lt;/p>
&lt;p>To make reading these results a bit easier, the this model summary output also includes asterisk symbols that indicate the significance levels of the p-values.&lt;/p>
&lt;p>Continuing to go down the summary we can see the residual standard error, and then we have the multiple R squared, or simply &lt;em>R&lt;sup>2&lt;/sup>&lt;/em>. This can thought of as the proportion of variance in the data explained by the model. You can ignore the adjusted R squared for now if you are just starting out.&lt;/p>
&lt;p>Finally we have the F-statistic and p-value testing whether all coefficients in the model are zero.&lt;/p>
&lt;p>Next we&amp;rsquo;ll add a line to our plot that shows the fitted line from this model.&lt;/p>
&lt;p>All you have to do is first run the plot function that we ran before, and then run the &amp;lsquo;abline&amp;rsquo; function with the model as it&amp;rsquo;s argument:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Plot the scatterplot as before:&lt;/span>
&lt;span style="color:#268bd2">plot&lt;/span>(DBH_in &lt;span style="color:#719e07">~&lt;/span> height_ft, data &lt;span style="color:#719e07">=&lt;/span> trees, pch&lt;span style="color:#719e07">=&lt;/span>&lt;span style="color:#2aa198">16&lt;/span>)
&lt;span style="color:#586e75"># And then plot the fitted line:&lt;/span>
&lt;span style="color:#268bd2">abline&lt;/span>(mod)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;img src="https://www.rforecology.com/post/how-to-do-simple-linear-regression-in-r/index_files/figure-html/unnamed-chunk-6-1.png" width="672" />&lt;/p>
&lt;p>Finally, you might need to extract a table with the regression results from the summary output, so I&amp;rsquo;ll show you a quick trick for doing that easily using the &amp;lsquo;broom&amp;rsquo; package.&lt;/p>
&lt;p>First, make sure to install the broom package if you haven&amp;rsquo;t already (though you only have to do this once for your computer), and then run the &amp;lsquo;library&amp;rsquo; function to load up that package (you have to do this each time you open up R and start a new working session):&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># Install the &amp;#39;broom&amp;#39; package:&lt;/span>
&lt;span style="color:#586e75"># install.packages(&amp;#39;broom&amp;#39;) #commented out since I already have it installed&lt;/span>
&lt;span style="color:#586e75"># Then load the package:&lt;/span>
&lt;span style="color:#268bd2">library&lt;/span>(&lt;span style="color:#2aa198">&amp;#39;broom&amp;#39;&lt;/span>)
&lt;/code>&lt;/pre>&lt;/div>&lt;p>Finally, use the &amp;lsquo;tidy&amp;rsquo; function to extract the table of results from your model:&lt;/p>
&lt;div class="highlight">&lt;pre style="color:#93a1a1;background-color:#002b36;-moz-tab-size:4;-o-tab-size:4;tab-size:4">&lt;code class="language-r" data-lang="r">&lt;span style="color:#586e75"># extract the table&lt;/span>
my_results &lt;span style="color:#719e07">&amp;lt;-&lt;/span> &lt;span style="color:#268bd2">tidy&lt;/span>(mod)
my_results
&lt;/code>&lt;/pre>&lt;/div>&lt;pre>&lt;code>## # A tibble: 2 × 5
## term estimate std.error statistic p.value
## &amp;lt;chr&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt; &amp;lt;dbl&amp;gt;
## 1 (Intercept) -6.19 5.96 -1.04 0.308
## 2 height_ft 0.256 0.0782 3.27 0.00276
&lt;/code>&lt;/pre>&lt;div class="alert alert-note">
&lt;div>
Note that the output is actually a tibble, which is better than a normal dataframe, but that&amp;rsquo;s for another lesson. ;)
&lt;/div>
&lt;/div>
&lt;p>And that&amp;rsquo;s it!
&lt;img src="https://www.rforecology.com/done_meme.jpg" alt="job well done">&lt;/p>
&lt;hr>
&lt;center>If you liked this post and want to learn more, then check out my online course on the complete basics of R for ecology:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://coaching.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog/" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item><item><title>How to organize your analyses with R Studio Projects</title><link>https://www.rforecology.com/post/organizing-your-r-studio-projects/</link><pubDate>Mon, 05 Jul 2021 00:00:00 +0000</pubDate><guid>https://www.rforecology.com/post/organizing-your-r-studio-projects/</guid><description>&lt;script src="https://rforecology.activehosted.com/f/embed.php?id=21" type="text/javascript" charset="utf-8">&lt;/script>
&lt;p>&lt;em>Here is a post that I am sharing from my old blog to get this one started. Enjoy!&lt;/em>&lt;/p>
&lt;p>In this post I&amp;rsquo;ll go over a basic method method for organizing your ecological data analysis projects in R. Why do this? Reproducing analyses is critical for good science. There is nothing worse than trying to re-run a script when you finally get comments back from your reviewers only to find that your results are a bit different than before. What?! Speaking from personal experience, it’s taken days of blood, sweat, and tears to figure out what was different in the data, what code I was running in the wrong order, or that I was running the wrong code all together! Start now and get in the habit of sticking to a system for organizing your R projects.&lt;/p>
&lt;p>While there are many methods and variations on how to do this (see links at the end of the post), the scope of this current post is to offer a short and simple overview of my own method so that you can get started ASAP. Those that follow me know that I am a big fan of getting right into the code and data—that is the best way to learn. So let&amp;rsquo;s get to it.&lt;/p>
&lt;p>&lt;strong>1)&lt;/strong> Use R Studio for all your analyses. Some of you 1% hardcore coders might prefer the minimalist terminal-type interface included in the basic R download, but for everyone else, use R Studio. It’s a no-brainer. &lt;a href="https://youtu.be/YKvkXKeGoa8" target="_blank" rel="noopener">See my video tutorial here on how to install it.&lt;/a>&lt;/p>
&lt;p>&lt;strong>2)&lt;/strong> Create a new project (File &amp;gt; New Project). The directory you set here will be the folder where you store your data, scripts, and other files related to your analysis.&lt;/p>
&lt;p>&lt;strong>3)&lt;/strong> Create the folder structure inside your project folder so that it looks like this:
&lt;img src="https://www.rforecology.com/folder-structure-example.png" alt="screenshot showing an example of the folder structure">&lt;/p>
&lt;ul>
&lt;li>“data” is where you keep your data, split into two folders, “raw” and “processed”. This is self explanatory. &amp;ldquo;Raw&amp;rdquo; is where you save your data as you entered or downloaded it (usually an excel spreadsheet file), and &amp;ldquo;processed&amp;rdquo; is where you save the CSV file ready for uploading into R&lt;/li>
&lt;li>“output” is where you save all the figures and tables that you generate with your R scripts.
“scripts” is where you keep all the R code files.&lt;/li>
&lt;li>Finally, “temp” is not necessary, but I’ve found it very useful. It is a folder where I can save any temporary outputs or scripts that I want to test out or explore, but that I know should not get confused with the final output of my analyses.&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>4)&lt;/strong> Create your R scripts. Unless your analysis is very simple and direct, you should be using multiple scripts (pretty much always the case when your project is large enough for an entire publication). Ideally, each script should be a set of code that you can run in one go. This is not always possible, but strive for that and use a separate script for each component of the analysis. I recommend you create the following scripts right away:&lt;/p>
&lt;ol>
&lt;li>Script for loading packages and custom functions&lt;/li>
&lt;li>Script for cleaning up and preparing the data for analysis&lt;/li>
&lt;li>Script for each analysis in the project. For example, in one study you might need both a figure that presents two histograms for visualization purposes, along with one linear mixed effects regression to test your primary hypothesis. Each of those should have their own script&lt;/li>
&lt;li>Name each script using this format: &amp;ldquo;##_name_v#&amp;rdquo;, where ## indicates the order that the scripts should be run in, “name” is a descriptor, and “v#” indicates the version number. Sometimes you want to change the script, but should keep older versions in case you mess something up. That’s where saving a new file with an updated version makes sense. So, all together your first set of scripts might look like this:
00_packages_v1.r
01_dataclean_v1.r
02_HistogramFigure_v1.r
03_LMER_v1.r&lt;/li>
&lt;/ol>
&lt;p>&lt;strong>5)&lt;/strong> Start off each R script with a good description of the entire project and particular scope of the script. The more comments the better, but more on script commenting in another post. Here&amp;rsquo;s an example:&lt;/p>
&lt;p>&lt;img src="https://www.rforecology.com/script-title-example_orig.png" alt="screenshot showing an example of the how to label the heading of your R scripts">&lt;/p>
&lt;p>That’s pretty much it! Each time you open the project in RStudio, all the scripts will open. Just make sure to run the packages and dataclean scripts before the others. By using RStudio Projects, there is no need to include a setwd() line, just add in “data/processed/“ before your filename whenever uploading any data, or add “output/“ or “temp/“ whenever exporting something.&lt;/p>
&lt;p>If you want some longer in-depth explanations on code management in R, check out these other excellent blog posts:&lt;/p>
&lt;ul>
&lt;li>&lt;a href="https://chrisvoncsefalvay.com/structuring-r-projects/">https://chrisvoncsefalvay.com/structuring-r-projects/&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://kkulma.github.io/2018-03-18-Prime-Hints-for-Running-a-data-project-in-R/">https://kkulma.github.io/2018-03-18-Prime-Hints-for-Running-a-data-project-in-R/&lt;/a>&lt;/li>
&lt;li>&lt;a href="https://ntguardian.wordpress.com/2018/08/02/how-should-i-organize-my-r-research-projects/">https://ntguardian.wordpress.com/2018/08/02/how-should-i-organize-my-r-research-projects/&lt;/a>&lt;/li>
&lt;/ul>
&lt;hr>
&lt;center>If you liked this post and want to learn more, then check out my online course on the complete basics of R for ecology:
&lt;br>
&lt;br>
&lt;ul class="cta-group">
&lt;li>
&lt;a href="https://www.rforecology.com?utm_source=blog&amp;amp;utm_medium=bottom_button&amp;amp;utm_campaign=rforecology_blog" target="_blank" rel="noopener" class="btn btn-primary px-3 py-3">&lt;strong>Start learning now&lt;/strong>&lt;/a>
&lt;/li>
&lt;/ul>
&lt;p>Also be sure to check out &lt;strong>&lt;a href="https://www.r-bloggers.com/" target="_blank" rel="noopener">R-bloggers&lt;/a>&lt;/strong> for other great tutorials on learning R&lt;/p>
&lt;/center>
&lt;script defer src="https://cdn.commento.io/js/commento.js">&lt;/script>
&lt;div id="commento">&lt;/div></description></item></channel></rss>