Thank you for answering our questionnaire! Based on your answers, we have provided our team's recommendations to help you properly set up Makya for your project.
Your answers
- I want to modify one or both extremity(es) of my molecule, keeping the scaffold fixed.
- I want to stay close to a reference molecule or dataset
- Yes, I have my own data
Given the requirements of your project, we recommend setting up a Fragment Growing generator coupled with QSAR models.
1. Selection of the generator
The Fragment Growing generator can be used for generating compounds by proposing new branches to grow a molecular fragment. It is purely chemistry-driven so a good understanding of organic chemistry is required. The exit vectors (reactive centers where the chemistry will take place) can be defined by the user or automatically calculated by Makya; the generator will then search for commercial building blocks which can react in this (/these) position(s).
For example, given a building block and one of its reaction centers (exit vectors), a boronic acid function, the fragment growing generator will propose novel compounds by attaching new branches at this center while keeping the rest of the fragment intact.
2. Construction of a QSAR model
Makya QSARs are classification models trained on the data of your choice. For each individual objective that you want to model (activity, ADMET properties...), Makya QSAR module automatically tests different combinations of molecular representations, parameters, etc, so as to select an optimal solution. QSARs can be used to guide a generator (thus generating molecules that are optimized on the QSAR objectives), as we recommend doing here, but also to score any molecule of interest.
3. Step by Step setup
Step 1: upload your project data in Makya
- To create your dataset, follow the steps described in the documentation: Datasets.
- Requirements: your dataset should not contain columns without any values or without a column title. It should contain a SMILES column.
- Note: Makya automatically cleans the dataset. For more information, see the documentation.
Step 2: create a new QSAR trained on your project data
- To create your QSAR, follow the steps described in the documentation: Creating a new Predictor by defining a Target Product Profile (TPP).
- Train your QSAR by clicking on the Run button in the QSAR tab.
- Once the training is complete, validate the performances of your model using the scores provided in Makya. This step is crucial as you do not want to use badly performing models to guide molecule generation (for more information, see the documentation).
Step 3: set up your Fragment Growing generator
A description of the Fragment Growing generator and of the setting-up steps is provided in the documentation: Fragment Growing generator. You can also find examples in our use-cases: for example, Growing around a fragment using a 3D reference molecule.
- Create a new Fragment Growing generator in the Generation tab of your project. The generator set-up page appears.
- In the Exit Vectors tab, enter the molecular fragment you want to grow from.
- The fragment should not contain any charged atoms, as protonation is performed directly inside Makya.
- It is important to input fragments that are suitable building blocks for chemical synthesis (for example, brominated or chlorinated fragments, or molecules with an OH to form an ester), as the Fragment Growing generator is a chemistry-based generator trained on chemical reactions.
- After having entered your fragment, you can either specify the exit vectors from which the generator should grow new molecules, or let the algorithm determine appropriate exit vectors.
- To specify exit vector(s), click on Set and input the atoms ID.
- Make sure to select all the atoms that will be involved in the reaction.
- In the Chemical Space tab, select the dataset of your project.
- The similarity to this chemical space will be an element of the overall fitness function that will be optimized during molecule generation. It ensures that you will stay close to your project molecules.
- In the QSAR tab, select the QSAR that you trained in Step 2.
- The objectives that you select will be an element of the overall fitness function that will be optimized during molecule generation.
These are the minimal steps needed to fit the requirements of your project. If you want to add more constraints on the generation, you can do so during the set-up of your generator. For example, you can add substructure constraints (forcing or preventing the presence of specific substructures) either on the building blocks that will be chosen as the new molecular branches, or directly on the generated molecules. The first option will drastically accelerate the generation by reducing the size of the catalog of building blocks explored by the algorithm: thus, we recommend using it whenever appropriate.
Step 4: run the generator and analyze your results
- To run your generator, go back to the Generator tab and click on Run.
- You can see the first generated molecules while the generation is still running. Check that there is no error in your set-up and that the generated molecules look conform to the requirements of your problem.
Once you have enough molecules, you can use the Parallel Coordinates to filter the molecules based on scores such as the QSAR predictions. For more information on the visualization, analysis and export of your results, check the documentation: Visualisation and Analysis of Generated Molecules.
For any questions, contact your Application scientist.