Catching Up with ZFIN
An interview with Monte Westerfield
In 2016, the NIH changed the way that model organism databases are supported, a change that had a major impact on ZFIN. We caught up with Dr. Monte Westerfield, professor at the University of Oregon Institute of Neuroscience and the director of ZFIN, to talk about ZFIN: what it does, how it works, how it is supported now, and what its future looks like.
Cecilia Moens: Thanks so much, Monte, for taking the time to chat about ZFIN. Let’s start at the beginning: What is ZFIN, and what does it provide for the zebrafish community?
Monte Westerfield: ZFIN is considered a model organism database, one of several that represent the major model organisms that are used for biomedical genetic research, including yeast, worms, flies, Xenopus, rat, mouse, and zebrafish. ZFIN was started in the early 90s when the zebrafish community had grown significantly and there were a number of laboratories doing mutant screens. Genes were being cloned and their expression was being described, and it became useful for basic research to be able to share those data. Then, of course, with the advent of genome sequencing, we had a complete zebrafish genome. ZFIN’s goal is to look at the published literature, as well as private datasets that are not necessarily published, and to curate those data into a centralized database. We then make the data readily available to all researchers everywhere, through our web interface and through machine readable ports and data reports. We have a standard set of data reports, and people can request special reports if they want a particular combination of different data types.
Cecilia: Who makes up ZIN? How many people and where are they located?
Monte: ZFIN team members fall into three categories. The largest group is our curators. These are PhD- or master’s-level biologists and geneticists who are dedicating their lives to reading all zebrafish publications and integrating the data into the database. Then we have software developers, a small team of just two people who are helping develop new features. And then we have a system administrator who keeps the whole system up and running using cloud-based system administration. Altogether we have about a dozen people, though several people are part-time. Most are resident here in Eugene, since we are based at the University of Oregon, but we have a remote curator who lives on the East Coast and one developer who is in Germany.
Cecilia: Tell me about a day in the life of a ZFIN curator: wake up in the morning, get a cup of coffee and then what?
Monte: There's the curation work itself. We have a triage process to prioritize the top papers—the papers that we feel have the most new data, because we don’t have enough staff to curate every zebrafish research publication that has data that would be appropriate to go into ZFIN. That process generates a hierarchical set of papers that are marked for curation. Those are divided among the curators, and the highest priority and most important part of the job is to is to read those papers and curate the data into the database. But there are many, many interruptions during the day because of the software development that's occurring. The software developers don't have the biological expertise to know everything they should be doing, so they work with the biologists very intimately through the whole design process. So, curators will spend a significant amount of their time participating in those sorts of meetings.
Cecilia: How does the triage process work?
Monte: We have scripts that run daily, that scrape all the literature to find everything that's been written about zebrafish, then we triage. If a paper reports a study of a gene that has not been studied before, it is higher priority than if the paper is studying a gene that's already been characterized. If it's a paper that involves analysis of a gene that's well studied, but perhaps in a new context, or with a new method, that's going to be somewhat lower priority.
Cecilia: Do you consider impact factor of the journal as a criterion in your triage process?
Monte: I recently looked at the list of journals that are represented in ZFIN that we've curated over the last year or two. The largest number of publications come from journals that have high impact factors. But that isn't the driving force. It just turns out that a lot of high impact factor journals have papers that have the meaty stuff.
Cecilia: Do the curators get to exercise judgment? For example, if they look at an in situ and can’t see what the author is describing, can they just not curate it?
Monte: Another big part of the time curators spend is communicating with the authors. There isn't a paper that doesn’t leave out some important information. For example, sometimes authors don't include IDs for the genes, or the antibodies or probes, so there is missing or incorrect information. The curators actually contact authors to try to get as much information as they can and to resolve discrepancies that they uncover during the curation process. This is why machines can't really do this work. There are those that say, Well, large language models could be 80% accurate, and that's good enough. We try to be 100% accurate, not 80%.
Cecilia: What can the readers of the News Splash do to make the lives of the ZFIN curators easier?
Monte: Contacting curators before submission of a paper is really, really helpful. Every page on ZFIN has a Contact Us button. Contact us to make sure that you're using the right nomenclature. You could even provide a list of the reagents you have used and note the IDs that you've used. The curators could do a double check. Because after it's published it can be corrected in ZFIN but of course the publication is still out there. There have been discussions about formalizing that process. Some of the other databases like WormBase, for example, has a form that authors can fill out before publication.
Cecilia: What about yourself? What proportion of your own work week is dedicated to ZFIN?
Monte: It's a significant portion. I represent ZFIN at the Alliance of Genome Resources. I have several meetings a week with the Alliance. Yvonne Bradford is our in-house manager; she manages most of the day-to-day activities. And then we have many regularly scheduled meetings. We use an agile scrum-type development process. And so that requires daily meetings with all of the basic agile development for everything that we do. And it's all coordinated through Slack.
Cecilia: Let's shift to talk about how ZFIN has been supported in the past and what happened to that funding in recent years.
Monte: We were funded very, very generously by a grant from the National Human Genome Research Institute (NHGRI) starting in 2000. At our peak, we were close to 20 people altogether. Then in 2016 the administration of NHGRI made a strategic decision to decrease the funding for the individual model organism databases, and to use the funds that were freed up to fund the Alliance of Genome Resources to foster the integration of the different data resources. So, the direct money coming into ZFIN was reduced over the course of several years. Last year was the final cut and we are now at 50% of the funding that we had before the cuts started.
We have new funding from the National Institute of Environmental Health Studies (NIEHS; see below). We also have a small grant that we share with the mouse genome informatics, Gabriella Miller Kids First pediatric research program for targeted curation of high priority childhood diseases. Small grants like that can help supplement the work and bring in some additional curation that we wouldn't otherwise be able to do.
Cecilia: ZFIN has users all around the world but for 30 years has only been supported by US sources. Do you feel like it's time for other countries help out? How can that happen?
Monte: That’s been something that has been discussed for a long time, probably at least 10 years. We already have staff that are working remotely and were trained remotely. We have the expertise now to do that. Somebody who lives in and is paid by an agency in another country could become part of our regular staff. So that's kind of a model that we're thinking about and just exploring the possibilities.
Also, ZFIN is one of the founding members of an international organization called the Global BioData Coalition that got up and running just last year. It consists of less than two dozen international data resources that are considered key and crucial to biomedical research worldwide. The sole function of this consortium is to develop plans for long term sustainability. We are meeting on a fairly regular basis. We try to bring in all the players outside of the US community and identify national or private funding agencies that currently fund these large data resources. The idea is to work with them and other stakeholders to try to pool money resources in a way to ensure the sustainability of the core resources.
Cecilia: Finally, let’s talk about the future of ZFIN. What are some of ZFIN’s newer initiatives?
Monte: We have several new initiatives. We’ve received additional external funding from the National Institute of Environmental Health Studies to start supporting the toxicology research that's done with zebrafish, which is very extensive. There are a large number of labs that do environmental health and toxicology experiments using zebrafish. Because of the amount of information in the publications in that field, we haven’t been able to curate them into the database and organize them in a way that's searchable. With this new funding we've been able to hire additional staff and devote significant infrastructure development to provide the support for these toxicology data. Recent count shows that we have over 22,000 records of changes of gene expression due to chemical treatments, thousands of different chemical environments, and a number of human (environmental) disease models that have been developed using zebrafish. So that's a really exciting new set of data that we've been able to release. We have plans over the next year or two to increase the types of searches and queries that people can use across these data.
Some of the other new exciting challenges include the growth of single cell data, which a number of labs are generating. A new informatics challenge that everybody's facing is how to represent and search those data that make them usable. A lot of the development work that we're doing is in conjunction with the Alliance of Genome Resources, in particular with Fly Base and Worm Base. One of our developers spends about 60 to 80% of his time on the Alliance where he is very heavily embedded in software development at the Alliance. Two or three curators also participate as subject matter experts in the software development process at the Alliance.
Cecilia: Is the goal of ZFIN to somehow bring the different zebrafish single cell atlases together into a single atlas?
Monte: Yes, but actually it's bigger than that. The goal at the Alliance is to integrate these datasets into a common resource so that you can do comparative analyses across species including human. It's a big challenge. Currently we're partnering with the Chan Zuckerberg Initiative, so we have some outside experts that are helping us with this endeavor.
Cecilia: What do you feel are the major challenges for ZFIN over the five to ten years?
Monte: I think the biggest challenge is to make decisions about what future development we should do. I mentioned single cell data, for example, and this toxicology support. There's going to be a new genome, several new genomes, actually, coming out fairly soon. Ideally, we'd like to have that genome fully annotated as the new reference genome. So those are three major projects and there are many other projects, things that we’d really like to do if we could. And whenever we talk to people or to our Scientific Advisory Board, they say, well, you could do this, and you could do that. But we have limited resources. So, a big challenge is really figuring out how big of a bite we can chew and which apple we should take it out of. It’s exciting. And when we make a commitment, like we're committed right now to the single cell data, there's the big challenge of how you do it. What other resources can you bring to bear on that problem?
Cecilia: Thanks for taking the time to talk, Monte, and more importantly thank you for all you do for the zebrafish community through your vision and dedication to ZFIN!