Our data: free and open-access

Recently I was trawling a publisher’s website for a technology article (Springer, since you ask), with no-access pages and tariff barriers all around me, when a cheeky wee popup asked me:

Would you use a data collaboration website to share your research data with colleagues?
Yes, publicly
Yes, but only privately
No
I don’t have data

There was a button missing: the response that said:

I want to know which data might be shared and under which circumstances.

And the other response that would have said:

I’m not going to give this information because you’re a commercial publisher and my data is gold-dust to you and in any case I believe free and open access is best for everyone.  And yes that includes the article I’m trying to get to read.

Maybe not best practice in constructing item responses…  And of course I can understand why a publisher would want to know about this: it’s a huge potential market for them.  But here’s why we shouldn’t let them have it.

Bit of back-story first.  I have to confess I’ve always been a reference manager obsessive.  I dabbled with a couple of early ones, played around with a couple of article reference databases on Citation, Mendeley, RefWorks, then went in for EndNote in a big way.  I used EndNote for the 35 pages or so of references to Transforming Legal Education.  About a week to the publisher’s hard deadline the library got corrupted, and my research assistant and I had to spend over three days sifting through backups, proofing and reproofing references that were in mint condition BC (Before Corruption) when I should have been proofing the main text.  We made it to AD (After Demoralisation), but I’ve never used EndNote again, and never will.  But I still had the virus, and when someone suggested having a look at what the amazing folks over at Mozilla were doing, I had a keek and was hooked on Zotero.  Not least because it sounds like a Mexican revolutionary.

So when we started LETR, Zotero was our reference management tool of choice (see here for a valuable comparison table).  We Zoteristos constructed a reference library of over 2,000 items there; and some Lit Review chapters’ reference sets were linked directly to Z.  Eventually, as version control became more problematic, we abandoned dynamic linking and just used the reference manager as a static, not a dynamic, database of references (ie they were no longer linked directly to the references in Z).  The application remained rock solid, we had to go premium as it expanded.  We then requested Gavin Maxwell, our applications developer to design us a basic app that would render the data on the LETR website here.[1]  But the app merely represents what’s in the Zotero database structure, which is still a private library.  And because it’s the reference database for a project that ended in June 2013, that’s static, frozen in time, while the legal education literature develops around it, flows onward.  All that effort, we thought, for something as capable as a dynamic reference set to be frozen…

It was an interesting instance of the problems of data publishing in legal education, or indeed anywhere in HE.  You legal education data hounds out there, all three of you, will know the issues.  First, there’s a relative absence of data in legal education and when we do collect it, how are we best to publish it so that there is an open dataset, accessible, transparent, secure and (if we want it to be) also capable of being analysed or extended by collaboration, globally.  Yes I know, a big ask, but let’s start with the merely impossible first and tackle data access next.

Emma Nicol and I recently completed a chapter for a book on the subject of simulation in legal education.[2]  We set out wanting to write a meta-review of the research literature on simulation, in Law, that used technology, 1970-2012, in common law jurisdictions.  We defined our terms, our methodologies, our search strategies, and in the first pass through the data we trawled over 350 items.

At this point, though, we paused.  When we reviewed the items, it was clear that meta-review was out of the question.  Only one item out of the initial pass contained anything like reliable statistical data on the design and implementation of the intervention.  This should come as no surprise to anyone working in legal education.  One of the findings from LETR was a/ how patchy legal education research is, in England & Wales, b/ how data-poor it is, c/ how unorganised it is, in terms of meta- and systematic reviews (and I suspect this will apply to most common law jurisdictions).  Since there was an almost complete lack of stats for us to base our meta-review on, we re-defined our task as being to construct a systematic review of the literature.  With this as our task, with new methodology and search strategies and in successive passes through the data, we eventually arrived at a reliable dataset of 123 items.  Our work is described in the chapter, and in a little more detail in this earlier post.  A copy of the dataset was appended to our chapter, with our editors’ permission.  But we wanted to share the database with the research community, and have our commentary close to it.  Zotero can be a public library, either static or dynamic (ie users can edit it).  But our commentary was locked up in a book, and the dataset in the book was, well, just print, not anything dynamic.  Once again, frozen in time.  Admittedly this dataset had to remain static because it’s the evidence of a lengthy and detailed research process; but I’d like folk to have it to build upon, as well.

Two questions arise from this chapter: first, why is there so little of this sort of activity going on in legal education; and second, is there a way of building and curating datasets so that others can access for free and contribute in the future?

How to get more and much better-quality data in legal education?  Huge issue of culture change in research practices, to put it simply — and a more difficult task than to create research mindset that enabled the Cochrane Collaboration to come about, 20 years ago, because we don’t have a conception of our research activities as amenable to systematic review at the very least.  Which is astonishing when you pause to think about it.  There’s no reason why we shouldn’t undertake these activities.  Educators and medical educators create the reviews and systematic reviews, and use them all the time.  Nor am I pinning responsibility on academics alone to produce this research.  All players have roles and responsibilities here in legal education: accreditors, regulators, funders, the profession, legal education providers, law librarians, academics, even students.

Cochrane is thinking about the future of data for the medical community. Read the perceptive editorial on the future of the Cochrane, comparing the old model Cochrane to a 747 and the new model Cochrane needing to be an Airbus380.  See especially the CochraneTech Symposium sessions — as Wallace says, about 11.50 into the video stream, there were eleven systematic reviews published every day, and that was back in 2010.  How will they cope with the increasing volume of data in the future?   They will apply rigorous research criteria.  They will use technology and research process planning to critique the system.  In medicine the process of understanding how systematic reviews are produced and how the environment affects their quality has been going on for some time.  Moher et al reported in their review of the reporting characteristics of systematic reviews back in 2007:

A little over half (161/300 [53.7%]) of the SRs reported combining their results statistically, of which most (147/161 [91.3%]) assessed for consistency across studies. Few (53 [17.7%]) SRs reported being updates of previously completed reviews. No review had a registration number. Only half (150 [50.0%]) of the reviews used the term “systematic review” or “meta-analysis” in the title or abstract. There were large differences between Cochrane reviews and non-Cochrane reviews in the quality of reporting several characteristics.[3]

It’s an issue that we’ll begin to tackle in our centre for legal education here at ANU College of Law.

So why shouldn’t publishers do this?  Because we’re handing over our life’s work to them, for their profit not ours.  Because our knowledge should not be locked up in their gated communities but free for all, world-wide.  And because we should be constructing the means for doing this for our own communities.  And we are.  See, for example, the 9th International Digital Curation Conference was held recently in San Francisco.  There, Eleni Castro, a Research Coordinator at the Institute for Quantitative Social Science (IQSS), Harvard University, and Alex Garnett, Public Knowledge Project, Simon Fraser University, spoke about the link up between journal articles and research data — in more detail, the link between the Public Knowledge Project and Harvard’s Dataverse Network Project.[4]  The project team worked on the problem of seamlessly managing the submission, review and publication of data associated with published articles, with the aim of increasing the replicability and the reusability of research data.  They did so by integrating two open-source systems, PKP’s Open Journal Systems, and the Dataverse Network, and providing a workflow to accompany the integration.

It’s one of many such projects in the conference, and the conference is one of many such out there.  And yet, in legal education, there seems to be little if any discussion about the existence, let alone the possibilities, of these initiatives. Time to change that.  And no, I never did get access to that article.

  1. [1]The app seems to have got a bit confused — click on ‘Accessibility’, appropriately enough, and it sorts itself out.
  2. [2]Maharg, P., Nicol, E. (2014).  Simulation and technology in legal education: a systematic review and future research programme.  In Grimes, R., Phillips, E., Strevens, C. (eds), Simulation and Legal Education, Ashgate Publishing, Emerging Legal Education series.
  3. [3]Moher, D., Tetzlaff, J., Tricco., A.C. Sampson, M., Altman, D.G. (2007).  Epidemiology and reporting characteristics of systematic reviews.  PLOS Medicine, DOI: 10.1371/journal.pmed.0040078.  http://www.plosmedicine.org/article/info:doi/10.1371/journal.pmed.0040078.
  4. [4]They define the Dataverse Network as ‘a repository for research data that takes care of long term preservation and good archival practices, while the researchers and data producers keep control of and get recognition for their data’.  See their slideset for more detail and definitions.  See this article for a description of the activities of the Dataverse, and this one for a workflow for the interface between journal and Dataverse.  They provide a nicely designed mock-up of the result.

Comments

2 responses to “Our data: free and open-access”

  1. Kristoffer Greaves avatar
    Kristoffer Greaves

    Excellent post, Paul. I have a particular interest in this area. I recently met with three librarians at my university to discuss bibliometric strategies around the subject of scholarship of teaching and learning in practical legal training. You will know, of course, that it’s not really possible to use the bibliometric tools mainstream databases provide, because legal education barely raises a blip in citations, or indeed, scholarship of teaching and learning generally.
    Regarding citation managers – I am a “rusted-on” Endnote user, with about 2,200 refs and PDFs in the current library. Data corruption dread causes me to backup and copy the database every day. I keep a monthly back up on a cloud server. I’ve been using NVivo for literature reviews, and Endnote’s “smart groups” features facilitates the export/import process for this. I’ve dabbled with Zotero and Mendeley, but can’t bring myself to abandon Endnote. I do think robust, “live” and shareable citation managers are a great idea though.
    When I look at John Hattie’s remarkable meta-study, (and Marzano’s smaller version) I think about what a powerful resource something like that would be for legal ed.

    1. Paul Maharg avatar

      Good points Kris. EndNote — ah, you have the touch with it, then. I think it comes down to that, albeit we were working with a much earlier version back in 2006. When we suspected corruption we went through the back-ups. But they’d got corrupted too, which pushed us back to (loose) paper-based records of what we’d been doing, and frantic reconstruction from there.

      Actually that taught me to be a lot more cautious about data archiving, on a personal basis. Because I work in quite a few places I work off a MacBook Pro and keep at least three copies of what I do: the MacBook hard disk, a separate Time Machine backup on an external hard drive and a cloud version with Mac Online Backup & Recovery, and I’m thinking that this is the absolute minimum.

      You’re right about legal education citation indices, but that’s up to us to remedy the situation, albeit the form of the system plays against us. Your citation of Hattie is exactly what I had in mind when I was thinking about the sort of foundational research that’s been done in education generally. His work is very impressive (reminds me of the meta-reviews done by Dylan Wiliam on feedback and assessment in schools, http://www.dylanwiliam.org/Dylan_Wiliams_website/Welcome.html) and yes, that’s sort of meta-review is exactly what we should be doing in legal education. The Cochrane Collaboration is such an achievement, over 20 years too, and an inspiration for us legal educators.