search

Kamis, 12 Februari 2009

Open Notebook Science, Reproducibility and Exclusion

There has been a fairly active conversation about Open Notebook Science over the past few days on FriendFeed. Some of the points I have wanted to make wont fit there so I'll post them here.

A short definition of ONS on Wikipedia: Open Notebook Science is the practice of making the entire primary record of a research project publicly available online as it is recorded.

I have repeatedly said that Open Notebook Science is probably not the best choice - at least initially - for most researchers interested in dipping their toes into Open Science. So why do I care so much about pursuing it?

It has to do with taking us past the tipping point to runaway real time open collaborative crowdsourcing in science.

There are two properties of ONS that at least set the stage for such a scenario.

1) Reproducibility: The primary purpose of a laboratory notebook is to make experiments reproducible for the researcher who recorded it. You can't improve on a process if don't keep a detailed record of what exactly happened in a given trial. A secondary purpose is to prove what a researcher knew and did at a specific point in time. This can be useful for patent enforcement if the notebook is kept private. There are other applications, but in general the sharing of experimental details is typically extremely limited in science.

I believe that this is an artificial barrier still standing to a large extent because of inertia. In the past, even if they wanted to, researchers could not realistically make this information public because of the absence of a convenient publication vehicle. But that is changing right now - technology is no longer the bottleneck. Very high quality free hosted services exist now that permit sharing with very little additional effort. All we have to do is record our laboratory notebook (which has to be kept anyway) on media that are easy to share and automatically indexed on major search engines.

By definition, the notebook should have all the information necessary to reproduce the results you obtained. If it is published in close to real time, someone who doesn't know you can read the details of an experiment you did today and contribute to the advancement of your project tomorrow. Or they may use your information for their own project. As long as they also maintain an Open Notebook knowledge can spread extremely rapidly. The efficiency of such a system between strangers is probably far greater than most scientific collaborations between researchers who already know each other.

2) Exclusion. If I come across an experiment you did yesterday and I have a desire to contribute meaningfully to your project, before executing the next experiment, I will want to look at all of the related experiments in your notebook. Of particular interest will be "failed experiments" to avoid repeating the same attempts. Or I may want to repeat one of your failed runs because I don't think that you properly controlled some parameter or made a mistake in the analysis.

The point of an Open Notebook is to get the truth - the whole truth about what you did and did not do. If I don't find what I am looking for in your notebook - and you have declared it to be an Open Notebook - then I can safely assume that you have not done it and I will feel confident to invest my resources to do that next experiment.

For an example of this system at work consider the Open Notebook Science solubility challenge. If I want to contribute to the project, the first thing to do is take a look at what has not been extensively measured yet. I can do this via the web query or directly on the experiment list page. By using the web query tool I can also look for contradictory results and try to resolve them. Or I may wish to include a control in my measurements and pick a compound that has been measured reproducibly using different techniques. I probably also want to look at the minute details of exactly how previous researchers applied their technique.

Now what would happen if we had adopted a Partial Open Notebook Science approach where we delayed recording lab notebook pages until a paper was published? Or what if we used the PONS variant of only recording experiments that "worked"?

Had we done anything short of a fully Open Notebook the project would have never gotten off the ground.

Tidak ada komentar:

Posting Komentar