datBase icon indicating copy to clipboard operation
datBase copied to clipboard

User personas

Open okdistribute opened this issue 7 years ago • 5 comments

Share some user personas here! Describe a person who would be a user of dat land. This person has a name, occupation, personality traits, goals, and skills. Your persona(s) should be based on a real story or user that you've encountered in the wild. This will help us understand who is using the project and help us build what they need.

More info here: http://www.ux-lady.com/introduction-to-user-personas/

okdistribute avatar Jul 07 '16 06:07 okdistribute

Occupation: Lab scientist Name: Nick Personality: Wants to solve hard problems but can only do so much without detracting from core competencies. Willing to learn new things if they help achieve main goals Skills: Manual file management (Finder drag and drop, naming), Jupyter Notebooks, a little scripting, a little command line, mathematics Goals: Wants to understand how the brain works, wants to publish papers on exciting topics, wants to share data with other labs who don't have access to same equipment, wants to consume data from other labs, wants to create interesting dynamic visualizations and share them along with the PDF of their paper

A typical workflow is recording data from a microscope during an experiment. An hour of microscope data can be in the ~10-50GB range, as it is typically medium resolution lossless video. They want to keep this raw data available indefinitely, but do not necessarily need daily access to it.

From the raw data they use scripting in Jupyter notebooks to whittle the data down into more structured data, e.g. dataframes with floating point values extracted from the raw images. These derived data are much smaller and easier to store and work with.

From the intermediate data they produce visualizations, which are the size of a typical image or YouTube video. They also create a computing environment in the form of a Jupyter notebook that takes the raw data and produces each state of intermediate data with the end result being a visualization.

Ideally their entire computing pipeline, the environment and all stages of the data, would be available to outside researchers to reproduce the same analysis Nick did in the lab. Other researchers might want to fork the pipeline and use part of it as a basis for their own research, or modify the parameters of Nick's analysis to ask another question of the data.

Nick has no institutional incentive to publish any of the data or code online, as it is not required for publication. But he does it anyway because it is a personal goal. So it's important that solutions be simple and easy to use, as the threshold of how complex a tool can be is very low.

max-mapper avatar Jul 07 '16 17:07 max-mapper

Occupation: Data Scientist Name: Erica Personality: Curious, hard-working, independent. Wants to get the job done well. Not interested in computer science software concepts, more interested in statistics and artificial intelligence. Skills: R, Python and other statistical software that gets the job done. S3, cloud computing, clusters. Basic sysadmin skills. Jupyter notebooks. Goals: Wants to build an open source machine learning toolkit that anyone can use easily. Wants to improve machine learning algorithms on the toughest datasets.

Raw data is taken from the web or from clients and put into a machine learning algorithm that runs on the cloud. She uses statistical software in R/python in combination with a web application called h2o (similar to data robot) that manages and monitors the machine learning jobs as they run on the cluster. Each job outputs a dataset and often a single number or a few different numbers that will be compared.

She wants to test her algorithms on multiple datasets. Often, she has trouble finding good datasets that have the columns, collection method, and metadata that she needs. She spends a lot of time cleaning up datasets and preparing them after finding them on the web.

Once she creates a good few datasets that showcase her new algorithm, she wants to publish those datasets on the web. They can often be very large, and so github doesn't work very well. She currently uses a zip file on an http server to distribute the datasets.

okdistribute avatar Jul 07 '16 18:07 okdistribute

(Two users that want to work together via dat.land)

Occupation: Civic Hacker Name: Melissa Personality: She wants to help improve her city and community using her development skills. She is not interested spending free time navigating political channels or government websites to procure & request data. She would rather develop apps than spend time cleaning data. Skills: Experienced developer with Node, Javascript, scripting, command line, github. Goals: Wants to build websites and other apps that highlight an issue in her community, help residents and city officials understand issue, and show changes over time as city data updates.

She is very good at making interesting apps once she has good data. But she is often discouraged when trying to find, access, and use city data. Data is in many places, can be hard to get online, and may be require several emails to get. When she finally gets data, it is an Excel file sent over email with very little metadata.

After partnering with other developers, they create a pipeline to go from the Excel data to a database. She uses that database to make an awesome web app. The city is very interested in the app and wants to use it in their work. But the data pipeline is very fragile and data must be updated manually so the web app becomes outdated and goes to waste. All of her code is published on Github but it is very hard to use because it relies all on hard-to-get source data.

Ideally, she would browse interesting datasets from her city in a single place and download or use them instantly. After creating a prototype app from several datasets, she would create a pipeline that automatically updates the database whenever the source data is updated. She could also start showing how the city is improving over time by comparing older data to the newest data, allowing the city officials and community to see progress towards their goals. Additionally, there would be some way to flag or comment on issues in the dataset so city officials can fix them.

She publishes her app and pipeline code publicly on Github and other cities can duplicate the app with their own compatible datasets, with minimal development knowledge.


Occupation: City of Portland Employee Name: Dante Personality: Mid-level city official working on housing & social services. Wants to improve city transparency and community involvement. Does not have much extra time for navigating city's data & IT systems except as he needs for own work. Skills: Manual file management, Excel, data analysis for City Housing & Social Services, familiarity with city data and internal access. Goals: He wants to improve accessibility, results, and transparency of social services in city. He works very hard to do this daily but is feels he cannot extend his impact to a broader group. His department doesn't have the budget or flexibility to develop online tools that will help him extend his effectiveness.

He has tried to help civic hackers by exporting and emailing data they request. But each request for new data makes it harder for him to do his other job and much of that work is done during off hours.

Instead of emailing data, he would at least like a place he can put the data that allows him to add context and has more permanence & transparency than email.

If possible, he would like the city or a volunteer could create a method to automatically publish data to a public place whenever the internal source data is updated. Then he would not have to spend his time exporting and uploading the data.

joehand avatar Jul 07 '16 23:07 joehand

Occupation: Digital Media Librarian Name: Jackie Personality: Works for a media non-profit, is trying to help her organization navigate the sometimes frustrating transition to all-digital media. She is forward-thinking and embraces technology, but reports to people who don't have as clear of an understanding as to how technology can help their cause. May need help stating the case. Skills: Library science, digital archivist, some experience with s3 and various cloud storage utils like Dropbox, etc. More oriented towards library science than IT/SysAdmin though. Goals: Archival storage of massive treasure trove of one-of-a-kind historical political media, mostly audio and video files.

She needs the ability to navigate the archive easily and make portions of the collection publicly available for free. Also needs to keep some (or most of it) private so the organization can produce physical copies of it when they go to do their annual fundraisers (to give away as thank-you gifts).

All of this needs to happen on a shoestring budget.


Occupation: Product Manager at a Tech Company Name: Jean Personality: Navigates a varied set of spreadsheets, analytics reports and internal tools to make timely decisions about the direction of a small-to-medium sized business. Has an enthusiasm for spreadsheets and streamlining business processes. Prides herself on always knowing what's going on, is a big-picture person. Skills: Spreadsheets, some technical skills with data, some proficiency with SQL. Relies on a data analyst for deeper insights though. Goals: Needs to make timely business decisions based on incoming feeds of data that are spread out across various tools. Frequently needs to share this data (or portions of it) with other more business-oriented (less technical) people on her team as a part of the decision-making process.

Watching/monitoring for changes in a data stream is a very important part of her job. Jean creates alot of reports that involve month-over-month and year-over-year comparisons. She'll often export data from various sources as CSV and import them to Excel to create reports to share with colleagues.

Jean sometimes also shares this data with developers who can create useful visualizations that help make her case. Jean might use Dat.land as a place to showcase the success and trajectory of her company as well as engage a small community of customers and developers who might want to use this data as well.

laurengarcia avatar Jul 08 '16 23:07 laurengarcia

@mafintosh @juliangruber Did you have any conversations in Krakow that might be good additions or updates to the personas/ use cases above?

Kriesse avatar Sep 30 '16 17:09 Kriesse