# Drawing cell genealogies

When modeling stem cells, development or tumor progression, we are often interested in the divisional history of cells and their progeny. This can, for example, provide information on the state of differentiation of a cell lineage or the clonal heterogeneity in a tissue.

Morpheus provides a number of ways to track cells and record their division history. Combined with the power of tools to visualize and analyse phylogenetic tree such as ETE3 [Heurta-Capas et al. 2016], this allows for in-depth analysis of cellular genealogies.

In this post, we will show:

1. How to export the divisional history from Morpheus using the CellDivision plugin.
2. How to visualize genealogical trees using the python package ETE3, and
3. How to execute shell scripts during a Morpheus simulation using the External plugin.

# Exporting the divisional history

Let’s assume you have a CPM model set up with dividing cells. (If not, have a look at the Examples/CPM/Proliferation_2D.xml example model.) In this case, you will be familiar with the CellType plugin CellDivision. This allows you to specify the conditions under which cell division occurs.

However, you may have missed the write-log option in the attribute of the CellDivision plugin. This feature allows you to record and export the divisional history of the cell in your simulation.

Use the Examples/CPM/Proliferation_2D.xml model to try for yourself.

The write-log feature supports several formats to export this information, discussed below:

• CSV format
• Dot format
• Newick format

## CSV format

The CSV format writes the divisional history in a tab-delimited text file where each line represent a cell division event and the columns represent:

• the time of cell division
• cell id of the mother cell
• cell id of one daughter cell
• cell id of the other daughter cell

An example is shown below:

Time        MotherID	Daughter1ID	Daughter2ID
71	    1	        2	        3
123	    3	        4	        5
145	    5	        6	        7
171	    2	        8	        9
173	    4	        10	        11
176	    8	        12	        13
200	    7	        14	        15


The CSV format is the most verbose format and the only one that provides the time stamp of the cell division events. However, it is not a standardized format or a specifically designed to represent trees. Therefore, using this format to visualize or analyse the tree structure will take some additional work.

## Dot format

The dot format is a well-known format to describe generic graphs, thus including trees, and can be visualized using programs such as GraphViz.

The dot format does not provide information on the time of cell division but simply lists the edges of the tree. For instance, if a the mother cell with id 1 had two daughter cells with id’s 2 and 3, this will be represented as:

1 -> 2
1 -> 3


A full cell division history file in dot format may look like this:

digraph{
1 -> 2
1 -> 3
2 -> 4
2 -> 5
5 -> 6
5 -> 7
4 -> 8
4 -> 9
9 -> 10
9 -> 11
8 -> 12
8 -> 13
12 -> 14
12 -> 15
3 -> 16
3 -> 17
}

Note that the closing curly bracket } is only created when finishing the simulation. If you stop the simulation prematurely, you may need to add it yourself.

To visualize the tree, you can use the GraphViz utility dot and create a PNG image like this:

 dot -Tpng cell_division.dot > cell_division.png


While useful, the dot format is not a tree-specific format and is designed for graphs rather than trees.

Use the celltype-csv or celltype-dot formats to obtain separate divisional history files for each cell type.

## Newick format

The Newick format, also known as the New Hampshire tree format, is a way to specifically represent trees using parentheses and commas. It has been adopted by many software tools for phylogenetic analysis such as PHYLIP.

The simplest version of the Newick format uses only parentheses and commas, with a semicolon as stop symbol, to describe the tree topology, e.g. ((,),(,));. Trees are represented by a single line of text, independent of the size of the tree.

Morpheus used a slight more elaborate Newick format (called format 8) with names for all the leaves as well as the internal nodes like this:

((("6","7")"4",("8",("10","11")"9")"5")"2",("12","13")"3")"1";


Here, the root of the tree ("1") is shown to the right. If you puzzle a little, you will see that one of its children ("3") has divided only once (daughters called "12" and "13") while its other daughter ("2") has left more progeny.

We will stick with the Newick format and visualize it using the python package ETE3.

Newick format files are only produced at the end of a simulation.

# Visualize trees using ETE3

ETE3 (for Environment for Tree Exploration, v3) is Python framework for the analysis and visualization of trees. It specialized in phylogenomic analysis, but many of its tools can be used for any kind of trees including cellular genealogies. It has a clear Python API and features some advanced visualization tools for trees. Please have a look at the ETE3 gallery for examples and code. For more information on its capabilities, read the paper by Heurta-Capas et al. 2016.

With this framework, we can take the Newick tree format as above and print its structure in text format and export a simple visualization as follows:

from ete3 import Tree
with open('cell_division_newick_cells_0.txt', 'r') as file:
t = Tree(s, format=8) # read Newick format 8
print(t) # print tree as txt
t.render('cell_genealogy.png') # export png image


This results in the following text output:

         /-"6"
/-|
|   \-"7"
/-|
|  |   /-"8"
|   \-|
|     |   /-"10"
--|      \-|
|         \-"11"
|
|   /-"12"
\-|
\-"13"


and the following image:

Thus, ETE3 allows you to get a quick tree visualization in a few lines of Python code. However, it also supports advanced tree visualization as shown below. Moreover, it provides a number of tools and metrics to compare tree topologies such as the Robinson-Foulds symmetric difference by simply executing tree1.compare(tree2). Here, however, we focus on visualization.

With a bit more work styling our visualization, we can generate a much nicer visualization.

Suppose we have a simulation like the one below, with a growing population of cells where mutations occur randomly during division. Clonal populations emerges as daughter cells inherit the mutations, here indicated by different colors.

We export the divisional history using the CellDivision plugin with the write-log option set to the Newick format. This results in a text file called cell_division_newick_cells.txt:

(((("34",(("62",(("108",(("140","141")"136",(((("214",("260",("348",("386",(("448",((("510",(("570",((("662","663")"658",((("1108",(("1170","1171")"1124","1125")"1109")"704",("738","739")"705")"686",("692","693")"687")"659")"582","583")"571")"512","513")"511")"502","503")"472","473")"449")"442",(("578","579")"576", ...


We also export the clone number c of each cell at the end of simulation using the Logger plugin, resulting in logger.csv:

"time"	"cell.id"	"c"
3000	9	50
3000	10	50
3000	11	50
3000	15	50
3000	16	50
3000	19	164
3000	22	164
3000	28	50
3000	31	154
3000	34	154
...


Next, we write a python script to do the following:

• read logger.csv into a pandas dataframe
• read cell_division_newick_cells.txt using ETE3 as before
• color-code leaf nodes with NodeStyle the clonal number from the dataframe
• style the tree with TreeStyle to have a circular layout
• export the visualization in SVG image format

import os, glob
import pandas as pd
from ete3 import Tree, TreeStyle, TextFace, add_face_to_node, NodeStyle

data_folder = "path/to/folder"

## newick files (there may be multiple if initializing with >1 cell)
fns = glob.glob(os.path.join(data_folder, "*newick*.txt"))

def value_to_hex_color(value, vmin=0, vmax=255):
'''convert number into hex color code'''
import matplotlib.pyplot as plt
from matplotlib import colors
norm = colors.Normalize(vmin=vmin,vmax=vmax)
c = plt.cm.gnuplot(norm(value)) # use same colormap as in simulation
return colors.rgb2hex(c)

for fn in fns:
with open(fn, 'r') as file:
s = s.replace('"', '')
t = Tree(str(s), format=8)

# set node style: background color
for cellid, clone in zip(df['cell.id'], df['c']):
# get node(s) with name 'cellid'
node = t.search_nodes(name="{}".format(cellid))[0]

# set background color of node
style = NodeStyle()
style["bgcolor"] = value_to_hex_color(clone)
node.set_style(style)

# set tree layout and style
ts = TreeStyle()
ts.show_leaf_name = False
ts.show_scale = False
ts.mode = "c" # circular layout
ts.arc_start = -180-45 # 0 degrees = 3 o'clock
ts.arc_span = 270

# export as SVG
outfile = os.path.join(data_folder,fn+'.svg')
t.render(outfile, w=1200, units='px', tree_style = ts)
print('Saved {}'.format(outfile))


If you execute this Python script this styling, we obtain an SVG image of our lineage tree where the tree is drawn with a circular layout and the background of the nodes indicates the different clones:

By exporting the clone number and using the same colormap, the color coding now corresponds to the colors in the simulation:

This combination allows you e.g. to correlate the tree structure with the spatial location of the different clones.

If you’re comfortable running Python scripts after simulation during post-hoc analysis, you can stop reading here. But if you’re curious how to execute a python script from within a Morpheus simulation, please read on.

# Executing external scripts

Now, wouldn’t it be nice to able execute scripts automatically after a simulation? This would safe you the time and effort to start e.g. a python session and execute your script after every simulation.

For this situation, Morpheus provides a useful Analysis plugin called External. This plugin, based on the tiny process library, let’s you execute shell scripts during of after simulation.

For instance, here we specify a shell script to execute the following bash script to first activate a conda environment my_environment and then execute the python script newick_visualization.py:

source activate my_environment
python /path/to/script/newick_visualization.py


To customize the environment, one can override environment variables such as PYTHONPATH. Here, we use to point it to the correct Python installation in the Anaconda folder:

Shell scripts can be executed as part of the same thread as the simulation, or using a separate background process, using the detach option. In this case, one can set a timeout to kill a detached process after a specific time (in seconds) to prevent a script to hang a simulation, i.e. during a parameter sweep.

Another protip is to use % (percentage) in the Command to provide global symbols as arguments in the script. For instance the substring %time will be replaced with the current time.

# Conclusion

Let’s wrap up.

We saw that you can export the divisional history in a variety of format from a Morpheus simulation using the write-log option in the CellDivision plugin.

Using the Newick format, you can generate advanced visualization of the lineage trees using the python package ETE3. The same package also provides tools to compare tree topologies, but this could be a topic of another post.

Last but not least, we saw how you can use the External plugin to execute e.g. Python scripts without ever leaving Morpheus.