--- title: "deepdep package" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{deepdep package} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- # Introduction Package `deepdep` was created to acquire and visualize information on dependencies of R packages in a smart and convenient way. Most of its functionality is contained in two functions: `deepdep` -- to get a `data.frame` with dependencies described and `plot_dependencies` -- visualize this `data.frame`. ```r library(deepdep) ``` ## Use case Suppose you're creating an R package and you want to include graph of its dependencies to your vignette, `README.md` file on your git repository or article on your package. With `deepdep` you simply need to type one line: ```r plot_deepdep("YourPackageName") ``` But before we describe how this function works in detail, let's see what are other functionalities of the package. # Features `deepdep` package exports the following functions: * `get_available_packages`, * `get_description`, * `get_dependencies`, * `get_downloads`, * `deepdep`, * `plot_dependencies`. Those functions rely on each other and are ordered from the lowest to the highest level. We'll describe what they exactly do and how on examples. ## get_available_packages This function lists, as the name indicates, available packages. The default behaviour is listing all CRAN packages. ```r t <- get_available_packages() head(t, 20) #> A3 AATtools ABACUS abbreviate abbyyR abc abc.data #> "A3" "AATtools" "ABACUS" "abbreviate" "abbyyR" "abc" "abc.data" #> ABC.RAP abcADM ABCanalysis abclass ABCoptim ABCp2 abcrf #> "ABC.RAP" "abcADM" "ABCanalysis" "abclass" "ABCoptim" "ABCp2" "abcrf" #> abcrlda abctools abd abdiv abe abess #> "abcrlda" "abctools" "abd" "abdiv" "abe" "abess" ``` However, if you want to check if package is present in a little wider range -- on CRAN or Bioconductor repositories, you simply need to set argument `bioc = TRUE`. In this case function is simply wrapper around `BiocManager::available()` and to use it you need to have `BiocManager` package (available via CRAN) installed. ```r t <- get_available_packages(bioc = TRUE) #> 'getOption("repos")' replaces Bioconductor standard repositories, see '?repositories' for details #> #> replacement repositories: #> CRAN: https://cran.rstudio.com/ head(t, 20) #> [1] "A3" "a4" "a4Base" "a4Classif" "a4Core" "a4Preproc" "a4Reporting" #> [8] "AATtools" "ABACUS" "ABarray" "abbreviate" "abbyyR" "abc" "abc.data" #> [15] "ABC.RAP" "abcADM" "ABCanalysis" "abclass" "ABCoptim" "ABCp2" ``` Another possibility is checking what packages are installed -- you do it by adding `local = TRUE` parameter. ```r t <- get_available_packages(local = TRUE) head(t, 20) #> [1] "abind" "agricolae" "aigoracomponents" "aigoraML" "aigoraOpen" #> [6] "aigorapack" "AlgDesign" "anytime" "AsioHeaders" "askpass" #> [11] "assertthat" "attempt" "backports" "base64enc" "base64url" #> [16] "bayestestR" "BBmisc" "bdsmatrix" "BH" "bigD" ``` Result of this function is cached (for more details, see *Caching* section of this vignette). ## get_description When you know, that given package is available, you may want to obtain DESCRIPTION of this package, at least the most essential parts of it, especially dependencies. You can do it by calling: ```r get_description("DALEXtra") #> DALEXtra: Extension for 'DALEX' Package #> Maintainer: Szymon Maksymiuk #> Description: #> Provides wrapper of various machine learning models. #> In applied machine learning, there #> is a strong belief that we need to strike a balance #> between interpretability and accuracy. #> However, in field of the interpretable machine learning, #> there are more and more new ideas for explaining black-box models, #> that are implemented in 'R'. #> 'DALEXtra' creates 'DALEX' Biecek (2018) explainer for many type of models #> including those created using 'python' 'scikit-learn' and 'keras' libraries, and 'java' 'h2o' library. #> Important part of the package is Champion-Challenger analysis and innovative approach #> to model performance across subsets of test data presented in Funnel Plot. #> Depends: R DALEX #> Imports: reticulate ggplot2 #> LinkingTo: #> Suggests: auditor gbm ggrepel h2o iml ingredients lime localModel mlr mlr3 ranger recipes rmarkdown rpart stacks xgboost testthat tidymodels #> Enhances: #> Scrap date: 2022-06-14 08:08:41 ``` Again, you can pass `bioc = TRUE` if you want to check for this package in Bioconductor repository. Notice that if package is not found there, it will be searched for on CRAN. The reason behind this type of behaviour is the fact that packages present on Bioconductor are updated more often than on CRAN and not all of them are present here. Option `local = TRUE` for only installed packages is also possible. If a package is not available in a given source, the function will return `NULL` value: ```r get_description("a4") #> NULL get_description("a4", bioc = TRUE) #> a4: Automated Affymetrix Array Analysis Umbrella Package #> Maintainer: Laure Cougnaud #> Description: #> Umbrella package is available for the entire Automated Affymetrix Array Analysis suite of package. #> Depends: a4Base a4Preproc a4Classif a4Core a4Reporting #> Imports: #> LinkingTo: #> Suggests: MLP nlcv ALL Cairo Rgraphviz GOstats #> Enhances: #> Scrap date: ``` Result of this function is also cached (for more details, see *Caching* section of this vignette). ## get_downloads This package allows you obtaining information on how many times specified package was downloaded. However, it works only with CRAN packages. ```r get_downloads("ggplot2") #> x #> last_day 58986 #> last_week 588212 #> last_month 2421187 #> last_quarter 7283366 #> last_half 15698538 #> grand_total 106188432 ``` Results of this function is not cached. ## get_dependencies After parsing description file, you can now create a `data.frame` which will describe dependencies between given package and others. You do it by using this function: ```r get_dependencies("ggplot2") #> name version type last_day last_week last_month last_quarter last_half grand_total #> 1 cli Imports 37403 540894 2412409 7540831 14636052 65253097 #> 2 glue Imports 23255 299464 1227913 3316178 7392136 64049386 #> 3 gtable >= 0.1.1 Imports 12722 162527 681744 1919155 4289100 35736269 #> 4 isoband Imports 13772 178728 755858 2156434 4537600 25733208 #> 5 lifecycle > 1.0.1 Imports 33300 474632 2110034 6295444 12314841 57743668 #> 6 MASS Imports 3451 54701 240819 548525 1168085 12322995 #> 7 mgcv Imports 1466 22857 88442 283694 611885 8639283 #> 8 rlang >= 1.0.0 Imports 38603 552818 2426430 7234865 15325972 99564579 #> 9 scales >=\n1.2.0 Imports 19735 262475 1155303 3409107 6550755 46716655 #> 10 tibble Imports 21206 273333 1140332 3203317 7609105 69099588 #> 11 vctrs >= 0.5.0 Imports 36066 510423 2300925 6420051 13411712 68644002 #> 12 withr >= 2.5.0 Imports 20456 276656 1197213 3456127 6660468 45414397 ``` As with two previously described functions - `get_available_packages` and `get_description`, here you can also use `bioc = TRUE` or `local = TRUE` and again, in case the package is not available, the result will be `NULL`. Here you have another options to set. The first one is parameter `downloads` -- should number of downloads of packages be included? It uses `get_downloads` and works only with CRAN packages. Another, more important parameter is `dependency_type`. You can specify how detailed should be list of dependencies. Default value is `"strong"`, which is a shorthand for `c("Depends", "Imports", "LinkingTo")`, but you can chose any combination of those and additionally `"Suggests", "Enhances"`. ```r get_dependencies("ggplot2", downloads = FALSE, dependency_type = c("Imports", "Suggests", "Enhances")) #> name version type #> 1 cli Imports #> 2 glue Imports #> 3 gtable >= 0.1.1 Imports #> 4 isoband Imports #> 5 lifecycle > 1.0.1 Imports #> 6 MASS Imports #> 7 mgcv Imports #> 8 rlang >= 1.0.0 Imports #> 9 scales >=\n1.2.0 Imports #> 10 tibble Imports #> 11 vctrs >= 0.5.0 Imports #> 12 withr >= 2.5.0 Imports #> 13 covr Suggests #> 14 dplyr Suggests #> 15 ggplot2movies Suggests #> 16 hexbin Suggests #> 17 Hmisc Suggests #> 18 knitr Suggests #> 19 lattice Suggests #> 20 mapproj Suggests #> 21 maps Suggests #> 22 maptools Suggests #> 23 multcomp Suggests #> 24 munsell Suggests #> 25 nlme Suggests #> 26 profvis Suggests #> 27 quantreg Suggests #> 28 ragg Suggests #> 29 RColorBrewer Suggests #> 30 rgeos Suggests #> 31 rmarkdown Suggests #> 32 rpart Suggests #> 33 sf >=\n0.7-3 Suggests #> 34 svglite >= 1.2.0.9001 Suggests #> 35 testthat >= 3.1.2 Suggests #> 36 vdiffr >= 1.0.0 Suggests #> 37 xml2 Suggests #> 38 sp Enhances ``` Result of this function is not cached (at least yet). ## deepdep The main function of the package -- it is simply wrapper around `get_dependencies`, that allows you getting not only dependencies, but also dependencies of the dependencies iteratively! (Now you know, why we called it **deep**dep). Parameters are the same as in `get_dependencies`, but additionally you can specify `depth` parameter, which describes how many iterations it function should perform. If `depth` equals 1, it's simply the same as calling `get_dependencies`. ```r deepdep("ggplot2", depth = 2) #> origin name version type origin_level dest_level #> 1 ggplot2 cli Imports 0 1 #> 2 ggplot2 glue Imports 0 1 #> 3 ggplot2 gtable >= 0.1.1 Imports 0 1 #> 4 ggplot2 isoband Imports 0 1 #> 5 ggplot2 lifecycle > 1.0.1 Imports 0 1 #> 6 ggplot2 MASS Imports 0 1 #> 7 ggplot2 mgcv Imports 0 1 #> 8 ggplot2 rlang >= 1.0.0 Imports 0 1 #> 9 ggplot2 scales >=\n1.2.0 Imports 0 1 #> 10 ggplot2 tibble Imports 0 1 #> 11 ggplot2 vctrs >= 0.5.0 Imports 0 1 #> 12 ggplot2 withr >= 2.5.0 Imports 0 1 #> 13 lifecycle cli >= 3.4.0 Imports 1 1 #> 14 lifecycle glue Imports 1 1 #> 15 lifecycle rlang >= 1.0.6 Imports 1 1 #> 16 mgcv nlme >= 3.1-64 Depends 1 2 #> 17 mgcv Matrix Imports 1 2 #> 18 scales farver >= 2.0.3 Imports 1 2 #> 19 scales labeling Imports 1 2 #> 20 scales lifecycle Imports 1 1 #> 21 scales munsell >= 0.5 Imports 1 2 #> 22 scales R6 Imports 1 2 #> 23 scales RColorBrewer Imports 1 2 #> 24 scales rlang >= 1.0.0 Imports 1 1 #> 25 scales viridisLite Imports 1 2 #> 26 tibble fansi >= 0.4.0 Imports 1 2 #> 27 tibble lifecycle >= 1.0.0 Imports 1 1 #> 28 tibble magrittr Imports 1 2 #> 29 tibble pillar >= 1.7.0 Imports 1 2 #> 30 tibble pkgconfig Imports 1 2 #> 31 tibble rlang >= 1.0.2 Imports 1 1 #> 32 tibble vctrs >= 0.3.8 Imports 1 1 #> 33 vctrs cli >= 3.4.0 Imports 1 1 #> 34 vctrs glue Imports 1 1 #> 35 vctrs lifecycle >= 1.0.3 Imports 1 1 #> 36 vctrs rlang >= 1.0.6 Imports 1 1 ``` ## plot_dependencies As famous quote says, > A picture is worth more than a thousand words. That's why we have `plot_dependencies` function. It allows visualizing easily what are dependencies of specified package. The function is generic, and currently supports two types of object -- you can pass a `deepdep` object, result of the calling the `deepdep` function or just type name of the package. With the latter option you can also pass arguments to `get_dependencies` as additional parameters. ```r dd <- deepdep("tibble", 2) plot_dependencies(dd) ``` plot of chunk plot_dependencies1 ```r plot_dependencies("DT", depth = 2, dependency_type = c("Imports", "Depends", "Suggests")) ``` plot of chunk plot_dependencies1 In each of the plots you can see one package name in the centre and two circles of packages gathered around them. These are dependencies of the first and second level. Default plot type is circular, as you can see on the examples presented above. However, you can set `plot_type` parameter to `tree`. ```r plot_dependencies(dd, type = "tree") ``` plot of chunk plot_dependencies2 Not all dependencies are plotted. To increase readability, dependencies on the same level are hidden, but you can change this behaviour ```r plot_dependencies(dd, same_level = TRUE) ``` plot of chunk plot_dependencies3 You can also make use of numbers of downloads you obtained. There is an option to add labels to only certain percentage of most downloaded packages among those that are about to be plotted. This is meant to increase readability of the plot. ```r plot_dependencies("tidyverse", type = "circular", label_percentage = 0.2, depth = 3) ``` plot of chunk plot_dependencies4 Finally, returned object is a `ggplot` object, so you can easily manipulate them with syntax known from `ggplot2` package. We also use `ggraph` enhancement for plotting graphs. ```r plot_dependencies(dd) + ggplot2::scale_fill_manual(values = c("#462CF8", "#F23A90", "#AF1023")) + ggraph::scale_edge_color_manual(values = "black") #> Scale for fill is already present. #> Adding another scale for fill, which will replace the existing scale. ``` plot of chunk plot_dependencies5 # Caching As we notoriously indicated in the text, we are using caching to make everything a little bit faster. Functions that operate on the lowest level, after scrapping information from the repositories, store them in temporary files on the local machine. As a consequence, calling function the second time with the same set of parameters, should be faster. State is also refreshed every 20 minutes to make sure you don't miss any major update. To make sure that you have the most recent data, you need to call `get_available_packages` and `get_description` with parameter `reset_cache = TRUE`. ```r get_available_packages(reset_cache = TRUE) ```