ProjectTemplate
ProjectTemplate copied to clipboard
Contributions to ProjectTemplate documentation
Hi everyone,
Thank you for your ongoing hard work on ProjectTemplate.
At the end of the Getting Started guide is the following sentence:
In a future piece of documentation, we’ll describe some of the more advanced features that ProjectTemplate offers.
I would like to be able to contribute to ProjectTemple by building additional documentation, but before embarking on this, I wanted to check if there is a wish-list anywhere for the advanced features you would like documenting next?
Matt Forshaw, Lecturer in Data Science, Newcastle University
Thanks @MattForshaw for your offer. We do love having documentation contributions. There isn't a list of features that are prioritized for documentation. If you could make a list of what features you think should be added to the documentation and then we can discuss priority. Is that fair?
@MattForshaw I use project template almost everyday. To that end, I can do my part and contribute in the documentation as well. Can follow your lead if you have a prioritized list of features that need documentation.
I just updated some of the changes I had in mind for the documentation on the website. I think the following sections still need work:
-
[ ] Expand "Introduction"
- [ ] What is ProjectTemplate
- [ ] How does ProjectTemplate work
- [ ] Explain convention over configuration philosophy
- [ ] Minimal usage example
- [ ] Basic installation, plus link to separate page
- [ ] What is ProjectTemplate NOT
- [ ] Integrate "Building packages"
- [ ] Remove separate page for building packages
-
[ ] Update "Getting started":
- [ ] Remove preference of text editor, just keep it neutral (or perhaps move it to a new section with useful tools).
- [ ] Instruct to download the file directly into the data directory:
download.file('http://projecttemplate.net/letters.csv.bz2', 'data/letters.csv.bz2')
would be a lot clearer I think and prevents dependencies on operating systems. - [ ] Update
ddply
example to equivalentdplyr
example
-
[ ] Update "Mastering ProjectTemplate":
- [ ] Instruct to download the
philapd.db
file directly - [ ] Combine instructions for SQL databases with those on "Supported File Formats"
- [ ] Move to separate page "SQL databases"
- [ ] Make better distinction between
- [ ] general configuration flags:
type
,user
,password
, ...; - [ ] database specific flags:
class
,classpath
,dsn
, ...; - [ ] and data selection flags
table
andquery
- [ ] general configuration flags:
- [ ] Instruct to download the
-
[ ] Expand "Configuring", add documentation on missing configuration flags (not all are listed!)
-
[ ] Remove "Updating" from website altogether, as the page is no longer relevant since version
0.3.5
. -
[ ] Clarify "Supported File Formats"
- [ ] Combine
.bz2
,.zip
,.gz
variants of extensions into the main extension (more like ".csv
: CSV files that use a comma separator (supports compressed variants)", and explain which compressed variants are accepted separately) - [ ] Link to new page "SQL databases"
- [ ] Combine
-
Further improvements:
- [ ] Add vignettes
- [ ] Check documentation of existing functions for typo's, unclear/ambiguous sentences.
- [ ] Check documentation for information that should move to a vignette
These are just some possible updates to the current pages that I can think of right now, but perhaps you (as new users?) are missing something altogether. Please feel free to let me know, I can add it to this list. Also, if you think something is utter nonsense to change, then also let me know, I can just as well remove it again.
I just saw this video about creating good documentation: https://www.youtube.com/watch?v=azf6yzuJt54 We might consider to restructure our documentation that way, because it helps us to create structure within the current website. The technical reference is kept pretty clean, so we might not need to add that on the website, although the contents are not easily browsable from the webbrowser.
Folks, any progress on this? I recommend we schedule a skype session for us to create a quick plan of who does what, what's needed in the documentation etc.
I haven't done anything about the documentation recently, perhaps it's even easier if you just pick an item to update and mark it as done on the list. If you think you're making bigger changes that might conflict with other people's efforts then just shout out ahead of time ;-)
We're chipping away at this slowly. Every month or so I get someone who wants to help with documentation and can point them to this list. Any help on this is greatly appreciated!
I have adopted project-template fully for my R projects. I teach it to my team at work too. (We might fork it and make customizations specific to our application). Perhaps I can put together a vignette or blog-post to show how I use it in a real-world project.
It's a great idea. For example, I'm lost with the cache
function. I don't know where to invoke it in my projects.
@rsangole What kind of changes would you like to make that requires a separate fork from the main project?
@Hugovdberg Quite a few customizations actually. I'm using this format to develop projects that might go into a more 'production' environment. So along with /src/ for the source files to call, I need additional folder structure for error & log files, intermediate calculation outputs and final outputs, plots and algorithm performance metrics. I'm standardizing these structures within my team. Furthermore, I'd like to replace all the readme markdowns with customized starter Rmd documents, which will have our logo, color scheme css etc.
@rsangole That sounds like you don't actually need to fork the project, but just need to create a custom template (using the new create.template
function) ;-)
Ah, alright. I've yet to explore that function. I'll look at it over the weekend.
@maikol-solis Thanks for joining us. Actually your questions about the cache
function would be great. There should be documentation on caching. Since we are so familiar with the project, it is hard to see what is confusion.
Could you help us by commenting on what is confusing for you and we can update the documentation there. It would be great if you could make a caching documentation issue so we can keep it in one spot.
@KentonWhite Thanks for helping us to understand the software.
In the munge
folder I process the data and create some clean data frame depending of my project. Here is my question: Where I should call the cache
function in order to avoid that the scripts in the munge
folder recreate the data frames when I run the load.project
. How should be an example for the 01.A.R
file?
#Load data
load(...)
#Preprocesing
MyDataFrame <- Some coding to process Raw data
#Is it correct?
cache(MyDataFrame)
Now, If in the src
folder I made some analysis, Could I call the cache
function to save results?
If I want to save the analysis results should be saved in the data
folder or where?
Thanks for the help.
@maikol-solis it appears your question wasn't answered yet. Data from the data
is loaded automatically based on the file extension (unless you have less common file type), there should be no need to load it manually in the munge
scripts. If you have the option cache_loaded_data
enabled the files are cached automatically.
If you have expensive munge scripts you might want to cache the results manually by calling cache
. You might then also want to build in a guard in the munge script to prevent it from running if the result was loaded from the cache.
Usually you call load.project
from a file in the src
directory, after which you can do the analysis based on the preprocessed data. If you want to store results the graphs
directory is created by default in the full
template. If you want you could also create a directory output
to store other output. (which I personally do in a custom template).
@Hugovdberg Thank you very much for the information. I was confused about how to use the cache
function. One more thing: how this function is aware of changes in data? I mean, if I re-run the munge scripts and re-create another clean data, do I have to call again the cache
function or is it aware of the change?
The cache function only writes to cache if the data in memory has changed from the data in the cache. At the moment the variable is always read from the cache, even if the original data file was changed. In that case you need to clear the variable from the cache and reload manually.