2. Simulation Workflow
[2]:
import pandas as pd
import os
import cometspy as c
from scarcc.preparation.find_directory import find_directory
from scarcc.preparation.metabolic_model import BasicModel
data_directory = find_directory('Data', os.path.abspath(''))
model_directory = find_directory('models', os.path.abspath(''))
sim_chamber_directory = find_directory('SimChamber', os.path.abspath(''))
E0, S0, all_components = BasicModel(flux_weighting=True, model_directory=model_directory).load_ES_models()
alpha_table = pd.read_csv(os.path.join(data_directory, 'alpha_table_m1.csv'), index_col=0)
p = c.params()
p.set_param("defaultKm", 0.00001) # M
p.set_param("defaultVmax", 10) #mmol/gDw/hr
p.set_param("maxCycles", 30)
# p.set_param("maxCycles", 150)
p.set_param("timeStep", 1)
p.set_param('writeFluxLog', True)
[3]:
S0.medium
smedium = S0.medium
smedium['EX_bulk_ac_e'] = 2
S0.medium = smedium
[3]:
{'EX_ca2_e': 10,
'EX_cl_e': 10,
'EX_cobalt2_e': 10,
'EX_cu2_e': 10,
'EX_bulk_ac_e': 3,
'EX_mg2_e': 10,
'EX_fe2_e': 10,
'EX_fe3_e': 10,
'EX_mn2_e': 10,
'EX_mobd_e': 10,
'EX_nh4_e': 10,
'EX_ni2_e': 10,
'EX_o2_e': 10,
'EX_pi_e': 10,
'EX_zn2_e': 10,
'EX_so4_e': 10,
'EX_k_e': 10}
[8]:
S0.medium
smedium = S0.medium
smedium['EX_bulk_ac_e'] = 0
smedium['EX_gal_e'] = 2
S0.medium = smedium
[9]:
current_gene = 'folP'
Sgn = S0.slim_optimize()
from scarcc.preparation.perturbation import alter_Sij
with S0:
alter_Sij(S0, alphas=22.8, genes=current_gene, ko=False)
print(S0.slim_optimize()/Sgn)
0.7634558054931522
[7]:
current_gene = 'folP'
Sn = S0.slim_optimize()
from scarcc.preparation.perturbation import alter_Sij
with S0:
alter_Sij(S0, alphas=22.8, genes=current_gene, ko=False)
print(S0.slim_optimize()/Sn)
0.7879863087652861
After configuring basic parameters, we can start to simulate the combined antibiotics. The following code will simulate the combined antibiotics for all the possible combinations of the folA and folP inhibitors. The simulation results will be saved in the directory specified by the variable sim_chamber_directory.
2.1. Multiprocessing to derive results
run_sim_workflow is a function that accept list of gene combinations and map each to perform simulation. Then takes the simulation output data to generate the classification of gene combination effect, concatenating all results and save to excel spreadsheet.
df_containers stored all the data frames and break down of it is summarized in the last section
This example provides a minimum arguments being passed to the function
method_list = [‘m1]
as suffix to the alpha_table and all the output data frames
supply either
SG_listorDG_listor bothsimulation_kwargsare passed to the classSimulateCombinedAntibioticsdirectly, will be discussed in the section “Breakdown of steps”
Prerequisite:
alpha_table correspond to chosen method list stored in data_directory, or run get_alpha_biomass_df before calling run_sim_workflow
Failed due to insufficient memory
If multiprocessing module fails, it could be the reason of insufficient memory being allocated OR amount of processes being set for too high. Please try setting max_cpus to lower values.
[ ]:
from scarcc.sim_engine.simulation_workflow import run_sim_workflow
SG_list = ['folA', 'folP'] # Optional, Example: ['folA', 'folP', 'folC']
DG_list = [['folA', 'folP']] # Optional, Example: [['folA', 'folP'], ['folC', 'folP'], ['folA', 'folC']]
method_list = ['m1']
simulation_kwargs = {
'E0': E0,
'S0': S0,
'base': sim_chamber_directory,
'p': p}
df_container = run_sim_workflow(method_list=method_list, data_directory=data_directory,
DG_list=DG_list, **simulation_kwargs)
reading results from existing files
biomass, growth_rate, normalized_growth_rate, drug_response_classification, flux data derived from m1 were saved in d:\scarccpy\Data
[7]:
from scarcc.sim_engine.simulation_workflow import run_sim_workflow
SG_list = ['folA', 'folP'] # Optional, Example: ['folA', 'folP', 'folC']
DG_list = [['folA', 'folP']] # Optional, Example: [['folA', 'folP'], ['folC', 'folP'], ['folA', 'folC']]
method_list = ['m1']
simulation_kwargs = {
'E0': E0,
'S0': S0,
'base': sim_chamber_directory,
'p': p}
df_container = run_sim_workflow(method_list=method_list, data_directory=data_directory,
DG_list=DG_list, **simulation_kwargs)
reading results from existing files
biomass, growth_rate, normalized_growth_rate, drug_response_classification, flux data derived from m1 were saved in d:\scarccpy\Data\flux_analysis_m1.csv
Customization of simulation parameters
custom_simulation_kwargsare passed to the classSimulateCombinedAntibiotics
Note : For a same set of single gene target that have run before, if “BM_SG_m1.csv” and “flux_SG_m1.csv” already exist in Data directory, and want to test new set of gene combinations, user can omit the SG_list argument. The program will use the available data that already exist.
[6]:
from scarcc.sim_engine.simulation_workflow import run_sim_workflow
custom_simulation_kwargs = {
'E0': E0,
'S0': S0,
'base': sim_chamber_directory, # directory to run each simulation in multiprocess
'p': p,
'checker_suffix': None, # checkerboard suffix for excel spreadsheet
'ko': False, # knockout reactions if reaction is encoded by target gene
# layout configuration parameters
'co': True, # Run coculture simulation
'mono': True, # Run monoculture simulation
'mono_E': True, # Run E0 monoculture simulation
'mono_S': True, # Run S0 monoculture simulation
'Smono_carbon_source': 'bulk_ac_e', # Carbon source for S0 monoculture
'carbon_source': None, # Dictionary for specification of carbon source for each culture
# comets model specifications
'initial_pop': 1e-8, # Initial population for simulation
'obj_style': 'MAX_OBJECTIVE_MIN_TOTAL', # pfba optimization
'SG_list': ['folA', 'folP'], # List of gene targets to simulate
}
df_container = run_sim_workflow(method_list=['m1'], data_directory=data_directory,
DG_list=[['folA', 'folP']], max_cpus=12, **custom_simulation_kwargs)
biomass, growth_rate, normalized_growth_rate, drug_response_classification, flux data derived from m1 were saved in d:\scarccpy\Data\flux_analysis_m1.csv
2.2. Detailed break down of steps
pseudo code for simulation procedure
with E0, S0:
alter_Sij(...)
set_comets_models(...)
set_layout_object(...)
sim_culture(...)
Note: To prepare genes to perform simulation, see Prepare Genes and Prepare Alpha
Warning
Modification of stoichiometry should be within context manager (with E0, S0:) such that in each run, the metabolic model restores to unperturbed state
2.2.1. Step 1: Stiochiometric Scaling
Reduction of biomass yield under perturbation (see Stoichiometry Scaling)
Note : If both target gene in gene combination leads to the inhibition of the same reaction, the inhibition effect to that particular reaction is set to the maximum one rather than scaled twice
[10]:
# Outside context manager ONLY for illustration of workflow, use with E0,S0: during implementation
from scarcc.preparation.perturbation import alter_Sij
current_gene = 'folA'
alpha_table.columns = [col.replace('_alpha', '') for col in alpha_table.columns]
alter_Sij(E0, alphas=alpha_table.loc[current_gene, 'E0'], genes=current_gene, ko=False)
alter_Sij(S0, alphas=alpha_table.loc[current_gene, 'S0'], genes=current_gene, ko=False)
[10]:
['DHFR', 'DHFR_v1']
2.2.2. Step 2: Construction layout object for cultures with perturbed metabolic models
[12]:
from scarcc.sim_engine.simulation_configuration import LayoutConfig
# E0, S0 are perturbed
base_nutrient = [
"ca2_e", "cl_e", "cobalt2_e", "cu2_e","fe2_e", "fe3_e", "k_e", "mg2_e",
"mn2_e", "mobd_e", "ni2_e", "o2_e", "pi_e", "so4_e", "zn2_e", "nh4_e"]
layout_kwargs = {
'E0': E0, 'S0': S0, 'carbon_source_val': .1, 'base_nutrients': base_nutrient,
'nutrients_val': 100, 'Smono_carbon_source': 'bulk_ac_e'
}
layout_config = LayoutConfig(**layout_kwargs)
layout_config.set_comets_model(E0, S0)
co_layout, E0_layout, S0_layout = layout_config.set_layout_object()
Or pass different carbon sources inside a dictionary for each cultures with ‘co’, ‘mono_E’, ‘mono_S’ as keys
[25]:
layout_kwargs['carbon_source'] = carbon_source = {'co': 'lcts_e', 'mono_E': 'lcts_e', 'mono_S': 'bulk_ac_e'}
layout_config = LayoutConfig(**layout_kwargs)
layout_config.set_comets_model(E0, S0)
co_layout, E0_layout, S0_layout = layout_config.set_layout_object()
2.2.3. Step 3: Run simulation with layout
[13]:
sim = c.comets(co_layout, p)
sim.run()
Running COMETS simulation ...
Done!
Extract biomass and flux data frames from COMETS sim object
[40]:
print('biomass: \n', sim.total_biomass.head(),
'\n E0 fluxes:\n', sim.fluxes_by_species['E0'].head(),
'\n S0 fluxes:\n', sim.fluxes_by_species['S0'].head())
biomass:
cycle E0 S0
0 0 1.000000e-08 1.000000e-08
1 1 1.000000e-08 1.000000e-08
2 2 1.000000e-08 1.000000e-08
3 3 1.000000e-08 1.014568e-08
4 4 1.047327e-08 1.014568e-08
E0 fluxes:
cycle x y CYTDK2 XPPT ... MOCOS BMOGDS2 FESD2s OCTNLL DHFR_v1
0 5 1 1 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
2 10 1 1 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
4 15 1 1 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
6 20 1 1 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
8 25 1 1 0.0 0.0 ... 0.0 0.0 0.0 0.0 0.0
[5 rows x 2716 columns]
S0 fluxes:
cycle x y 3HAD180 ... OA4L_ST OA5L_ST OA4VL_ST DHFR_v1
1 5 1 1 0.000092 ... 0.0 1.990201e-05 0.0 0.0
3 10 1 1 -0.008731 ... 0.0 5.364746e-07 0.0 0.0
5 15 1 1 -0.000136 ... 0.0 6.088823e-06 0.0 0.0
7 20 1 1 -0.000117 ... 0.0 5.213432e-06 0.0 0.0
9 25 1 1 -0.000151 ... 0.0 6.747233e-06 0.0 0.0
[5 rows x 2549 columns]
2.3. Integration of all steps
2.3.1. Streamlined Process for Simulation Workflow
The implementations illustrated above are integrated in the class SimulateCombinedAntibiotics
takes a target gene or a tuple of gene combination as input
performs a simulation, and returns the biomass and flux results
[3]:
from scarcc.sim_engine.simulation_workflow import SimulateCombinedAntibiotics
alpha_table = pd.read_csv(os.path.join(data_directory, 'alpha_table_m1.csv'), index_col=0)
alpha_table.columns = [col.replace('_alpha', '') for col in alpha_table.columns]
cas = SimulateCombinedAntibiotics(current_gene=['folP','folA'], E0=E0, S0=S0, alpha_table=alpha_table,
p=p, mono=True, base=sim_chamber_directory)
biomass_df, flux_df = cas.get_BM_df()
Running COMETS simulation ...
Done!
Running COMETS simulation ...
Done!
Running COMETS simulation ...
Done!
[20]:
from scarcc.sim_engine.output import MethodDataFiller
df_container = {}
df_container['checkerboard', 'SG'] = {'biomass': SG_biomass_df,
'flux': SG_flux_df}
df_container['checkerboard', 'DG'] = {'biomass': DG_biomass_df,
'flux': DG_flux_df}
mdf = MethodDataFiller(df_container, data_directory=data_directory)
mdf.fill_container()
mdf.write_to_csv()
C:\Users\wongt\AppData\Roaming\Python\Python311\site-packages\scipy\optimize\_minpack_py.py:906: OptimizeWarning: Covariance of the parameters could not be estimated
warnings.warn('Covariance of the parameters could not be estimated',
biomass, growth_rate, normalized_growth_rate, drug_response_classification, flux data derived from checkerboard were saved in d:\scarccpy\Data\flux_analysis_checkerboard.csv
2.4. Display of data frames inside df_container
All the result data frames can be queried with tuple key: (method_chosen, XG)
where method chosen is any method in the method_list and XG is either ‘SG’ or ‘DG’
[5]:
df_container[('m1', 'DG')].keys()
[5]:
dict_keys(['biomass', 'flux', 'growth_rate', 'normalized_growth_rate', 'drug_response_classification'])
“growth_rate” and “drug_response_classification” are generated in class
MethodDataFillerTo import, run:
from scarcc.sim_engine.output import MethodDataFillerInitialize a df_container and pass to MethodDataFiller:
pass a df_container with keys “(method_chosen, XG)” for each XG, provide a dictionary with keys “biomass” and “flux” and the values as the corresponding data frame
Example:
df_container = {}
df_container['m1', 'SG'] = {'biomass': SG_biomass_df,
'flux': SG_flux_df}
df_container['m1', 'DG'] = {'biomass': DG_biomass_df,
'flux': DG_flux_df}
MethodDataFiller(df_container)
mdf.fill_container()
mdf.write_to_csv()
SG data frames
[30]:
pd.set_option('display.max_columns', 3)
[print(k, ': \n', df.head(), '\n') for k, df in df_container[('m1', 'SG')].items()]
biomass :
E0_Normal_coculture S0_Normal_coculture E0_Normal_monoculture \
cycle
0 1.000000e-08 1.000000e-08 1.000000e-08
1 1.000000e-08 1.000000e-08 1.754153e-08
2 1.000000e-08 1.000000e-08 3.077053e-08
3 1.000000e-08 1.029133e-08 5.397622e-08
4 1.094643e-08 1.029133e-08 9.468255e-08
S0_Normal_monoculture E0_folA_coculture S0_folA_coculture \
cycle
0 1.000000e-08 1.000000e-08 1.000000e-08
1 1.077695e-08 1.000000e-08 1.000000e-08
2 1.161426e-08 1.000000e-08 1.000000e-08
3 1.251663e-08 1.000000e-08 1.014568e-08
4 1.348911e-08 1.047327e-08 1.014568e-08
E0_folA_monoculture S0_folA_monoculture E0_folP_coculture \
cycle
0 1.000000e-08 1.000000e-08 1.000000e-08
1 1.332445e-08 1.041132e-08 1.000000e-08
2 1.775410e-08 1.083955e-08 1.000000e-08
3 2.365636e-08 1.128541e-08 1.000000e-08
4 3.152081e-08 1.174959e-08 1.045757e-08
S0_folP_coculture E0_folP_monoculture S0_folP_monoculture
cycle
0 1.000000e-08 1.000000e-08 1.000000e-08
1 1.000000e-08 1.304388e-08 1.041559e-08
2 1.000000e-08 1.701428e-08 1.084844e-08
3 1.014085e-08 2.219323e-08 1.129929e-08
4 1.014085e-08 2.894858e-08 1.176887e-08
flux :
Unnamed: 0 cycle x y CYTDK2 XPPT \
Gene_inhibition Species culture
AB E0 coculture 0 25 1 1 0 0
S0 coculture 1 25 1 1 0 0
HXPRT NDPK5 SHK3Dr NDPK6 ... \
Gene_inhibition Species culture ...
AB E0 coculture 0 0 0.042396 0 ...
S0 coculture 0 0 0.099704 0 ...
pe2_ST pg2_ST clpn2_ST \
Gene_inhibition Species culture
AB E0 coculture NaN NaN NaN
S0 coculture 8.020000e-07 2.200000e-07 7.330000e-09
pa2_ST ps2_ST 12dgr2_ST \
Gene_inhibition Species culture
AB E0 coculture NaN NaN NaN
S0 coculture 2.090000e-09 2.090000e-09 9.840000e-08
OAL_ST OA4L_ST OA5L_ST OA4VL_ST
Gene_inhibition Species culture
AB E0 coculture NaN NaN NaN NaN
S0 coculture 0.0 0.0 0.0 0.0
[2 rows x 3101 columns]
growth_rate :
E0_coculture S0_coculture E0_monoculture S0_monoculture
Gene_inhibition
Normal 0.074819 0.040716 0.547661 0.074824
folA 0.021745 0.009762 0.287016 0.040308
folP 0.021060 0.009690 0.265734 0.040718
normalized_growth_rate :
E0_coculture S0_coculture E0_monoculture S0_monoculture
Gene_inhibition
Normal 1.000000 1.000000 1.000000 1.000000
folA 0.290640 0.239757 0.524075 0.538707
folP 0.281477 0.237998 0.485216 0.544185
[30]:
[None, None, None, None]
DG data frames
[8]:
[print(k, ': \n', df.head(), '\n') for k, df in df_container[('m1', 'DG')].items()]
biomass :
E0_folA.folP_coculture S0_folA.folP_coculture \
cycle
0 1.000000e-08 1.000000e-08
1 1.000000e-08 1.000000e-08
2 1.000000e-08 1.000000e-08
3 1.000000e-08 1.000954e-08
4 1.000217e-08 1.000954e-08
E0_folA.folP_monoculture S0_folA.folP_monoculture
cycle
0 1.000000e-08 1.000000e-08
1 1.000217e-08 1.003103e-08
2 1.000434e-08 1.006215e-08
3 1.000650e-08 1.009337e-08
4 1.000867e-08 1.012469e-08
flux :
cycle x y CYTDK2 XPPT HXPRT NDPK5 \
Gene_inhibition Species culture
folA.folP E0 coculture 25 1 1 0.0 0.0 0.0 0.0
S0 coculture 25 1 1 0.0 0.0 0.0 0.0
E0 monoculture 25 1 1 0.0 0.0 0.0 0.0
S0 monoculture 25 1 1 0.0 0.0 0.0 0.0
SHK3Dr NDPK6 NDPK8 ... \
Gene_inhibition Species culture ...
folA.folP E0 coculture 0.042396 0.0 0.0 ...
S0 coculture 0.099704 0.0 0.0 ...
E0 monoculture 0.042396 0.0 0.0 ...
S0 monoculture 0.295397 0.0 0.0 ...
pe2_ST pg2_ST clpn2_ST \
Gene_inhibition Species culture
folA.folP E0 coculture NaN NaN NaN
S0 coculture 8.016775e-07 2.199246e-07 7.330820e-09
E0 monoculture NaN NaN NaN
S0 monoculture 2.375154e-06 6.515772e-07 2.171924e-08
pa2_ST ps2_ST 12dgr2_ST \
Gene_inhibition Species culture
folA.folP E0 coculture NaN NaN NaN
S0 coculture 2.094520e-09 2.094520e-09 9.844244e-08
E0 monoculture NaN NaN NaN
S0 monoculture 6.205497e-09 6.205497e-09 2.916584e-07
OAL_ST OA4L_ST OA5L_ST OA4VL_ST
Gene_inhibition Species culture
folA.folP E0 coculture NaN NaN NaN NaN
S0 coculture 0.000000 0.0 0.000000 0.0
E0 monoculture NaN NaN NaN NaN
S0 monoculture 0.000003 0.0 0.000004 0.0
[4 rows x 3100 columns]
growth_rate :
E0_coculture S0_coculture E0_monoculture S0_monoculture \
Gene_inhibition
Normal 0.074819 0.040716 0.547661 0.074824
folA.folP 0.000000 0.000000 0.000000 0.000000
First_gene E0_monoculture_First_gene \
Gene_inhibition
Normal Normal 0.547661
folA.folP folA 0.287016
S0_monoculture_First_gene Second_gene \
Gene_inhibition
Normal 0.074824 Normal
folA.folP 0.040308 folP
E0_coculture_Second_gene S0_coculture_Second_gene \
Gene_inhibition
Normal 0.074819 0.040716
folA.folP 0.021060 0.009690
E0_monoculture_Second_gene S0_monoculture_Second_gene
Gene_inhibition
Normal 0.547661 0.074824
folA.folP 0.265734 0.040718
normalized_growth_rate :
E0_coculture S0_coculture \
Gene_inhibition
Normal 1.0 1.0
folA.folP 0.0 0.0
Predicted_additive_effect_E0_coculture \
Gene_inhibition
Normal 1.000000
folA.folP 0.081808
Predicted_additive_effect_S0_coculture E0_monoculture \
Gene_inhibition
Normal 1.000000 1.0
folA.folP 0.057062 0.0
S0_monoculture Predicted_additive_effect_E0_monoculture \
Gene_inhibition
Normal 1.0 1.00000
folA.folP 0.0 0.25429
Predicted_additive_effect_S0_monoculture First_gene \
Gene_inhibition
Normal 1.000000 Normal
folA.folP 0.293156 folA
E0_monoculture_First_gene ... E0_monoculture_Second_gene \
Gene_inhibition ...
Normal 1.000000 ... 1.000000
folA.folP 0.524075 ... 0.485216
S0_monoculture_Second_gene po_diff_E0_coculture \
Gene_inhibition
Normal 1.000000 0.000000
folA.folP 0.544185 -0.081808
po_diff_S0_coculture po_diff_E0_monoculture \
Gene_inhibition
Normal 0.000000 0.00000
folA.folP -0.057062 -0.25429
po_diff_S0_monoculture Drug_comb_effect_E0_coculture \
Gene_inhibition
Normal 0.000000 Additive
folA.folP -0.293156 Antagonistic
Drug_comb_effect_S0_coculture \
Gene_inhibition
Normal Additive
folA.folP Antagonistic
Drug_comb_effect_E0_monoculture \
Gene_inhibition
Normal Additive
folA.folP Antagonistic
Drug_comb_effect_S0_monoculture
Gene_inhibition
Normal Additive
folA.folP Antagonistic
[2 rows x 24 columns]
drug_response_classification :
Drug_comb_effect_E0_coculture Drug_comb_effect_E0_monoculture \
Gene_inhibition
Normal Additive Additive
folA.folP Antagonistic Antagonistic
Drug_comb_effect_S0_coculture Drug_comb_effect_S0_monoculture
Gene_inhibition
Normal Additive Additive
folA.folP Antagonistic Antagonistic
[8]:
[None, None, None, None, None]