Organize Data
During a project, good file organization can help in a variety of ways:
- less searching for the right file backups of data reduce the risk of data loss well-documented work
- knowing what you did, how you did it, when you did it
- creating file formats that you can be used now and in the future
- easier reporting on progress to funders, team compliance with university and funder requirements
- data structured in ways that facilitate analysis and integration
Work is messy. In time you will create multiple files in various formats, multiple versions, methodologies, etc., all relating to your research. Spending a little time upfront can save a
lot of time later on. But be realistic. Strike a balance between doing too much and too little. There is no single way to do it; establish a system that works for you and your collaborators.
Best Practices
File Formats
Selecting the optimal file format(s) for your data will help ensure that your data will be accessible for future use (your own, and for others). When selecting tools for your data, pay special attention to the output formats of your data. Use these best practices to reduce the chances of data loss from software or data obsolescence.
- Open, machine-readable, and non-propriety data are preferable
- If data must be in a proprietary format, include a readme file that includes details about the software/hardware needed to open files and ensure that it can easily be converted to open, non-proprietary format
- Share multiple formats if format used by research community is typically proprietary (eg. MonaLisa_v1.psd AND MonaLisa_v1.tiff)
- If compression is necessary, use lossless format
File Naming and Organization
File Naming Conventions
- Create meaningful names relevant to content, independent of location
- Avoid very long file names
- Use underscores (this_is_the_file_name) or “camel case” (ThisIsTheFileName) for separating terms
- If you include a date, use one of these formats: YYYY_MM_DD, YYYY-MM-DD or YYYYMMDD to facilitate sorting
- To facilitate sorting, consider the potential number of files and include place holder digits in the name (e.g., for up to one hundred files, begin with …001…)
- Avoid using spaces and special characters, i.e. ~ ! # & @ ( ) { } [ ] ‘ “ | % $ ; ^
- Include versioning where needed
- Be consistent
Example: Survey21 _Smith_2015_06_01.txt — a survey in a text file with participant 21, conducted by Smith on June 1st, 2015.
File Version Control
Versioning helps you to easily find the version you want and not overwrite the version you need.
Depending upon practices in your field, you may version analysis/program/script files or data files themselves. Don’t forget to also version project documentation, progress reports, etc. Here are some versioning options:
Use dates: Use dates to distinguish between successive versions.
- data_20230101
- data_20230201
- data_20230301
Use version numbers: user ordinal numbers (1, 2, 3, etc.) for major version changes and a letter for minor changes.
- data_v1
- data_v1.a
- data_v1.b
- data_v2
In many cases, it is helpful to log the changes so that you can quickly assess and access the versions. Is it good to document what was changes, who made the change, when the change happened, and why the change was made. Here is an example of a basic changelong template that includes the basic elements of a changelog and one way of simply documenting changes in file versions.
# fileName_Changelog
All notable changes to this project/file will be documented in this log.
## v1 YYYY-MM-DD J Doe <jdoe@example.com>
* change 1 note
* change 2 note
## v2 YYYY-MM-DD J Doe <jdoe@example.com>
* change 1 note
* change 2 note
## v3 YYYY-MM-DD J Doe <jdoe@example.com>
* change 1 note
* change 2 note
Adapted from Data Management: File Organization by MIT Libraries Data Management Services which is licensed under a icensed under a Creative Commons Attribution 4.0 International License.