This a summary of some of my recent projects. Many of these are part of my thesis work, although some are side-projects outside of main primary research.
PIMMS (Polymer Interactions in Multultcomponent Mixtures) is a fully generalizable 2D/3D lattice-based polymer physics Monte Carlo simulation engine. What this means - in simple terms - is that it allows a user to very easily and very rapidly perform simulations to explore how a large collection of polymer chains would behave.
While various simulation packages exist, PIMMS offers a number of advantages in the context of studying the sequence determinants of phase separation.
- A wide range of Monte Carlo moves (both single chain and multiple-chain moves) which can easily be customized to a specific system
- Sub O(n) scaling of computational cost with chain length (depending on the moveset scaling can be near O(1))
- Long and short range interactions (for modeling electrostatics)
- Rigid and flexible body moves (for modeling folded proteins and flexible polymers)
- New approaches for enhanced sampling using alternative-chain Markov sampling (unpublished)
- A well developed API for the easy development of new on-the-fly analysis routines
- The potential for the description non-equilibrium systems
- Automated simulated annealing to help avoid kinetic traps
PIMMS is primarily written in Python, with the heavy lifting done using Cython. With the exception of the interface for writing XTC trajectory files out, I've written everything from scratch. This has allowed me to build a simulation engine tailored to ask the exact scientific questions of interest. In collaboration with a number of experimental colleagues from institutions around the world, we're now using PIMMS to explore how charge patterning of polyelectrolytes and polyampholytes influences phase separation in disordered proteins.
PIMMS was debuted in a talk I gave at the Biophysical Societies' annual meeting in Los Angeles in February of 2016. Development has been progressing rapidly since then, and the plan is to have a publicly available release out in summer 2017.
For more information, check out the official PIMMS website at http://www.pimms.xyz.
CTraj is an ongoing project to build a new analysis suite explicitly for CAMPARI generated trajectories. Taking advantage of the outstanding MDTraj, CTraj essential deals with any inconsistency between CAMPARI generated trajectories and more traditional MD generated trajectories to offer seamless integration of MDTraj's analysis suite, allowing users to take advantage of a wide range of analysis tools.
In addition, we have developed a number of new analysis techniques (both published and unpublished) which are implemented in CTraj. While CTraj is still in development, it is also distributable as Python package. If you're interested in testing out the current development version shoot me an email and let me know!
CIDER and localCIDER represent ongoing projects to facilitate easy and rapid computation of protein sequence properties relevant for intrinsically disordered proteins. CIDER is a webserver written using the Django framework , while localCIDER is Python package accessible on the Python package index and installable via
pip (specifically using the command
pip install localcider). localCIDER utilizes
matplotlib for analysis and plotting routines. localCIDER and CIDER were both developed with James Ahad, currently an MSTP student at Case Western, and
More recently I've been leveraging tools to manipulate and analyze the data using a new approach to remote access API (i.e. a REST replacement) which would be appropriate for small to medium sized datasets, which I call (for now) LVDA - local version data access. For more information on the benefits of LVDA and it's implementation using ProteomeScoutAPI, please see our recent publication (Holehouse & Naegle, 2015).
geeneus is a Python based API which facilitates direct access to source-agnostic information from both UniProt and NCBI Protein records. In a nutshell, it lets you write programs which have direct access to a wide range of info which can be fetched using only the protein accession numbers.
geeneus was motivated by the fact that it would be extremely useful to be able to unambiguously gain programmatic access to protein sequence information, as well as a range of metadata, but doing so in a reliable and simple way did not exist. Using geeneus, in three lines of Python you can access sequence information, known mutations, domains, species, and, importantly, full isoform sequence information.
from geeneus import Proteome # A ProteinManager object is the functional object through which you get # sequence information. From the perspective of the user it is # stateless, and can be considered an API object manager = Proteome.ProteinManager("firstname.lastname@example.org") # The manager object can be queried to gain a variety of information manager.get_protein_sequence("accession number")
All the networking and data management is taken care of behind the scenes (including a dynamic-programming approach to caching). As a result geeneus is appropriate for both hi-throughput analysis and interactive data exploration.
Stanford's Prof. Andrew Ng was among the first to produce and release a massively online open course (MOOC) in the fall of 2011. I followed the course intently, and created a set of stand-alone course-notes based on the material, which I released in January 2012. These gained some coverage on Hacker News and have generated some very positive feedback, which is always nice.
As with any set of notes, there are errors and typos which many people have been kind enough to point out over the years. If you spot any please let me know!
IDP State Letter
I co-founded and edit a monthly intrinsically disordered protein (IDP) centric newsletter through the Biophysical Society IDP Subgroup. We write short summaries on papers to try and bring ideas surrounding IDPs to a broader audience. Past issues can be viewed here. If you're interested in becoming part of the editorial team please drop me an email and I can provide more information.
In addition to science and software, I also play a lot of ultimate frisbee. I play mixed club with the wonderful pArchd Ultimate, and for the last few years have run a weekly pickup game - for more details take a look at our listing on FFinder.