Data Sharing Initiative

Join us and share data or models

Together with my colleagues Lee Miller, David Ostry and Kurt Thoroughman I will be putting in a proposal to the NSF CRCNS data sharing initiative to build a database of human and monkey reaching results and models. It is a moderate size project (cannot be >500k$). This initiative is about a joint dataset that combines experimental results and applicable models (movement variables such as trajectories and neurophysiology). This promises to make it useful for psychophysicists, modelers and electrophysiologists.

What is the idea?

The general question of how people and animals move and learn to do so is one of the most challenging questions in neuroscience, biomechanics, computer science, statistics and psychology. Scientists within the reaching community study voluntary reaching movements made by humans and non-human primates. While these movements constitute a small subset of all possible movements, they are used as a model system to study fundamental questions that apply to a wide range of motor behaviors. Many topics such as learning and adaptation, the role of the biomechanics of muscles for the control of movement and cue combination are studied by the reaching community. The goals of these reaching studies are to both uncover fundamental principles at the system level, and to identify the underlying neural substrate(s). Here we present a database project that promises to accelerate progress toward these goals.

We propose developing a joint database of experiments and models that addresses the control and learning of reaching movements. This approach benefits from the tight integration of modeling and experimentation that is a hallmark feature of the reaching community. However, the community is unfortunately somewhat divided between groups favoring different paradigms, focusing, for example, on equilibrium point control, internal models, uncontrolled manifolds, dynamical systems or probabilistic approaches. We believe that this project will simplify the comparison and synthesis of approaches. This project promises to significantly accelerate progress for both scientists who work experimentally and those using modeling techniques. For modelers, a database facilitates testing a model against multiple datasets. This part of the database also facilitates the search for neural correlates of the variables central to a given model. For experimentalists, such a database also facilitates disproving models and designing experiments that efficiently distinguish between models.

Several aspects of the proposed database contribute to the expected progress. (1) Many reaching experiments are similar to one another. For instance, a good number of labs use similar virtual reality environments or robotic devices. As such, different elements in the dataset of experimental results may be meaningfully compared. (2) A rich set of models promises to generalize to many distinct reaching behaviors. Therefore, each model may apply to a range of experimental findings. (3) A, large number of research labs will participate in the creation of the database and have already expressed their participation by letters of support – this is a real community effort. Contributions from even more labs will be sought and the scope of the project will widen as the project continues. The project will also focus on other activities to make this joint database useful: workshops, summer schools and competitions. Lastly, progress at integration of modeling and experiment in the reaching community may serve as a role model for some other areas of computational neuroscience.

Why is this interesting and why might you care?

Why this project may be interesting for scientists who do movement experiments on humans

When designing a new experiment, it would be very useful if we could know what outcome would be predicted by each of the well regarded models. However, coding up the models before, or even after doing the experiment is laborious. Without a coherent dataset of models, calculating experimental predictions for a range of models is a very laborious task. Often enough, the mathematics in published papers is not quite complete, parameters may be missing, they are hard to implement, and even after implementation, it is hard to be sure that no error was made. Getting predictions from models implies the simulation of hypothetical experiments where the model makes predictions.

Why this project may be interesting for scientists who develop models

It can be argued that ideally, a good model should explain data from several labs and that a good experiment should not try to falsify a single model, but a good number of them. Indeed, over the last couple of years, several labs have introduced models of motor control, motor adaptation, and learning that explain multiple data sets from several labs. At the moment, without a coherent dataset of experimental results, applying a model to a range of reaching experiments is a very laborious task. The only way of obtaining datasets is to scan the data out of published papers, which is imprecise and laborious or to ask the lab that published a paper for the original data, which is slow, laborious and, given the workload of many professors, often impossible. The obvious challenge to overcome is how to store data from several experiments in a way that may allow a model to be applied to several experimental datasets while still representing the differences in experimental methodology. A database of experimental results makes testing models much easier.

Why this project may be interesting for scientists who do electrophysiological experiments

The model part of the database could be interesting, as it would allow you to easily calculate the predictions of leading models before even doing an experiment. It would also make it easier to extract relevant variables from models (such as adaptation state, movement jerk, error probabilities, etc) that one could regress against neural activities. At the same time, your participation would allow modelers to analyze some of your data.

What about ethics/IRB

We asked our IRB office about the issue last year and they produced us a nice letter with their assessment. As we will only be dealing with anonymized information, Northwestern University does not consider this to be research with human subjects and thus there does not seem to be a problem.

What we propose to do with the applied for money

  1. Hire a programmer who receives data (internet, dvds, cds) and models (matlab, c, fortran) and formats them into a joint database.
  2. Start a set of workshops asking why and how we should share data and models
  3. Have summer schools to teach people to use the database and competitions to compare models

Who else will participate?

Last year we submitted the proposal in a different version with a somewhat narrower definition. It was criticized for being too narrow. Now we have a much wider definition and broader set of co-PIs. Last year, 23 outstanding reaching scientists had written letters of support.

How you could help us now (before Nov 14th)

Send us a letter of commitment. We have templates attached to this webpage.
Also, ask questions, and let us know of any ideas you may have.

Examples of possible letters

Letter of support for electrophysiology lab
Letter of support for movement psychophysics lab
Letter of support for modeling lab

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License