Out-of-the-Box GenAI for Medicinal Chemistry

AI in science is crossing a threshold. With the launch of tools like Claude for Life Sciences, the notion that assistants should do more than generate text is going mainstream— AI in research today needs to use scientific tools, interact with real data, and help move projects forward. Tool-use isn’t a novelty anymore; it’s quickly becoming the baseline.

But an important distinction is emerging: different scientific domains need different assistants.

Claude is becoming a strong partner for biology-forward workflows—protocol drafting, bioinformatics, regulatory documentation, and connecting to platforms like Benchling or PubMed.

Medicinal chemistry is a different problem space.
It doesn’t live in documents. It lives in structures, pockets, properties, conformers, and models that compute. That’s why general assistants (Claude, ChatGPT, Gemini) are still limited to reading and summarizing (or in the case of Claude for Life Science, passing text-based data between tools).

Balto is built for chemistry.
It’s the domain-specific, tool-using assistant for molecular design, docking, property prediction, and structural analysis—the “Claude for chemistry,” powered not by document connectors but by molecular simulations.

This post walks through what current generative AI can (and can’t) do for chemists, and where chemistry-aware assistants like Balto meaningfully change the workflow—from reading to extracting, modeling, interpreting, and deciding.

Generative AI for Reading and Knowledge Synthesis in Chemistry

General-purpose AI excels at summarizing. Models like Claude or ChatGPT are excellent at summarizing biological literature or protocols, and Claude’s new scientific connectors broaden that even further. But none of these tools can extract structures from PDFs and convert them to usable chemical objects—something that matters the moment your work depends on molecules rather than prose.

While general summarization is quite strong from these models, there are limits - particularly when it comes to queries where an accurate answer comes from gathering and manipulating the right set of underlying data. These models aren’t built for chemistry. They won’t flag the difference between a small mistake and a major flaw in a synthesis route. They also can’t pull chemical structures out of a PDF and turn them into something you can actually work with. So while they help with surface-level understanding, they stop short of giving you material you can use in the lab.

A query of "Find known PI3K inhibitors with activity better than 100 nM in pubChem" in ChatGPT 5

So how does a chemistry-specific genAI tool differ when it comes to reading and knowledge synthesis? 

Instead of just saying what a paper is about, the right tool can pull out molecules, convert them to SMILES, and connect them to modeling or docking. That means the jump from “reading” to “doing” is immediate. Additionally, parts of data curation and knowledge gathering that require "action" can occur directly within a single chat and interface.

Balto takes a simple request and directly queries the database in question, returning both a summary and resulting data files that can be manipulated with other chemistry-specific tools

Reading and Knowledge Synthesis: General vs. Chemistry-specific

AI Comparison Table
Task General-purpose AI (ChatGPT, Claude, Gemini) Chemistry-specific AI assistant
Summarize papers Yes — can turn dense text into plain summaries Yes — plus highlights chemistry-relevant details
Extract chemical structures from PDFs No Yes — can convert images/tables into usable molecules (e.g. SMILES)
Interrogate domain-specific knowledge Ability to interact with a wall of text and ask for clarification Domain specific visualizations and underlying data
Connect to downstream tools No — outputs stay as text Yes — molecules can be sent straight to modeling or docking
Practical use for chemists Good for quick overviews Bridges reading and doing actual work

Integrating Tools and Agentic Workflows in GenAI for Chemistry

Some AI tools can now connect to external apps or even run short workflows on their own. Coding copilots like GitHub Copilot or ChatGPT with a Python sandbox can fetch data, clean it up, and run analyses. More advanced “agentic” setups can chain multiple steps together without constant prompting and are often paired with tool use.

Claude’s new scientific connectors—Benchling, BioRender, PubMed, and others—push this further for biology and document-heavy scientific research. They make it easier to coordinate literature, protocols, and experimental records through a single interface.

But chemistry needs something different. Chemists don’t just move text between systems; they move structures, 3D pockets, ADMET properties, and docking inputs. Those require tools that understand molecules, not just metadata. That’s why Balto bundles chemistry-specific capabilities directly into chat, no scripting or custom setup required.

In theory, you could ask a general agent to pull data from PubChem, write Python to filter results, and generate plots. But here’s the catch: most of us aren’t software engineers. Catching subtle errors or debugging generated code takes time and expertise—and even skilled coders end up validating every step. Worse, these systems aren’t tuned for chemistry, so they won’t know what filters matter in a lead optimization project or how to flag a result that looks chemically off.

Claude Sonnet 4.5 highlighting potential pitfalls in how it would attempt to respond to "Find all FDA-approved kinase inhibitors with a molecular weight under 450 Da, then analyze their common substructures and predict which scaffold would be most promising for developing a new BTK inhibitor with improved brain penetration."

So while agentic workflows and coding copilots are a big step forward, they still keep the burden on the scientist. You’re the one who has to check outputs, fix scripts, and connect the dots.

The same prompt in Balto returns a comprehensive set of matching kinase inhibitors, supplies the underlying data, provides a human-readable trail of steps taken, and provides chemistry-aware scaffold recommendations. All from a single conversational prompt.

Integrating Tools with GenAI: General vs. Chemistry-Specific

AI Comparison Table
Task General AI copilots / agents Chemistry-specific AI assistant
Connect to external tools Yes — can call APIs or run scripts Yes — pre-connected to chemistry tools and optimized to execute together
Handle chemistry data Only as raw text or numbers Understands molecules, reactions, and structures
User input needed Often requires Python or scripting Simple instructions in plain language
Quality of outputs Depends on user’s coding and checks Tuned for chemical accuracy
Burden on the scientist High — you fix and validate Lower — assistant handles chemistry logic

Running Simulations

This is where general-purpose genAI stops. ChatGPT or Claude can explain what a molecular docking experiment is. They can even write example code for it. But no "out -of-the-box" model is set up to actually run the simulation. At best, you get instructions for tools you’ll have to set up and run yourself.

A chemistry-specific AI assistant changes that. Instead of telling you how to run a simulation, it runs it for you. Docking, pocket finding, ADMET prediction—these are built into the system. You ask in plain language, and the assistant produces usable results.

Claude Sonnet 4.5 writes a Python script one could potentially use after installing required software, manually downloading data, and changing settings

That’s the difference between an AI that only talks about science and one that can do science. For a medicinal chemist, it means going straight from idea to data in the same tab without writing code.

Running Molecular Simulations: General vs. Chemistry-Specific

AI Comparison Table
Task General-purpose AI Chemistry-specific AI assistant
Explain simulation concepts Yes — can describe methods like docking or MD Yes — plus tailored to chemistry use cases
Generate example code Yes — Python scripts or workflows Not needed — runs simulations directly
Run simulations No Yes — docking, pocket finding, ADMET predictions, etc.
Output format Text or code only Usable chemical results, ready for analysis
Scientist effort High — set up software, validate code Lower — focus on interpreting results

Interpreting Results

Analyzing outputs is just as important as running the job. General AI models can summarize results, but they don’t know enough chemistry to judge them correctly. They might overstate an outcome, misinterpret a graph, or simply make something up. This is the risk of hallucination — the model fills in blanks with guesses.

A chemistry-specific AI assistant is tuned differently. It doesn’t just summarize data; it can interpret it with the rules and context of the field. For example, it can highlight whether a docking score is within a realistic range, flag odd results, or point out if a conformer looks unstable. Instead of acting like a general storyteller, it behaves more like a trained colleague who knows what “normal” looks like in medicinal chemistry. Additionally, domain-specific visualizations are the bread and butter of informed decision making and collaboration in life science research. These domain-specific visualizations are yet to be included in common LLMs.

The result of two prompts to Balto (1) "Dock Adempas to crystal ligand site of 7D9R" and "yes", resulting in manipulate-able underlying data and industry-specific visualizations

Interpreting Scientific Results: General vs. Chemistry-Specific

AI Comparison Table
Task General-purpose AI Chemistry-specific AI assistant
Summarize outputs Yes — but may miss key details Yes — with focus on chemical context
Identify errors or odd results Limited — lacks domain checks Stronger — flags unrealistic or unstable results
Handle numerical outputs (e.g. docking scores) Often misinterprets or guesses Understands typical ranges and significance
Risk of hallucination High — fills gaps with guesses Lower — grounded in chemistry and biology knowledge
Usefulness for chemists Surface-level insight Reliable interpretation that supports decisions

Benchmarks: Scientific Comprehension to Help You Do Work

A chemistry-aware assistant isn’t just about running simulations—it’s about understanding scientific material well enough to guide decisions. To measure this, we evaluated Balto on LitQA2 (Future House), a benchmark designed to test deep reading comprehension of real scientific papers.

Balto’s LitQA2 score: 86% pass rate (172/200)

  • PaperQA2 (Future House): 66.0% ± 1.2
  • Human experts: 67.7% ± 11.9
  • Typical LLM performance without tool use: ~25–30%

This means Balto reads and reasons about chemical literature at a level that exceeds reported human expert performance and surpasses existing published benchmarks—not because it is a general LLM, but because it is tuned specifically for chemistry and built to work with chemical structures and data.

Percentage accuracy correct by model or human on the LitQA2 benchmark. Note that Humans had unlimited time and tool access.

This level of comprehension is what makes the next step possible: an assistant that not only understands your work, but can help do the work.

Acting Like a Lab Assistant

Most AI tools today stop at talking. They read, they summarize, they even suggest code. But they don’t plan an experiment with you. And they don’t carry out the steps.

A chemistry-specific AI assistant shifts from being a “chatbot” to being a lab assistant. It can plan workflows: suggest which molecules to test, flag where to focus next, and lay out the steps in order. And it doesn’t stop there — it can also do the work: extract molecules, run docking, generate conformers, and return results you can use.

This combination of planning and doing is what makes the difference. Instead of a tool that just answers questions, you have one that helps push projects forward. Then the question becomes how well that assistant provides work that is accurate, fast, and understandable. We think this is where Balto really shines.

👉 With Balto, you can:

  • Pull a compound from popular databases and run it through a docking workflow in one chat.
  • Ask for binding pockets on a target and get structured results back.
  • Move from idea to data without switching tools or writing code.

Additionally, the combination of access to underlying chemistry tools and

Acting Like a Lab Assistant: General vs. Chemistry-Specific

AI Comparison Table
Task General-purpose AI Chemistry-specific AI assistant (Balto)
Plan workflows Limited — vague suggestions Concrete steps, tailored to chemistry
Do the work No — requires external setup Yes — runs docking, pocket finding, ADMET prediction, and more
Combine planning + doing Not possible Integrated in one place
Output format Explanations only Usable data and next-step suggestions
Role for chemists Still carrying the load Supported by an AI lab partner

Different assistants for different domains

The launch of Claude for Life Sciences has recently garnered a great deal of attention: it underlines the new-ish expecatation that genAI is moving beyond chat into action, with connectors and skills that make it genuinely useful for biologists, clinical teams, and regulatory scientists.

But it also reinforces a crucial reality:

Life sciences is not one domain.

Cheminformatics and molecular modeling have different needs than genomics or regulatory writing. No single assistant will do everything well.

Claude is built for biology-heavy workflows:

  • literature reviews
  • protocol and SOP generation
  • bioinformatics and data analysis via connectors
  • clinical and regulatory documentation
  • access to Benchling, BioRender, PubMed, Wiley, and 10x tools

Balto is built for chemistry-heavy workflows:

  • pull molecular data from common databases
  • generating SMILES or 3D structures instantly
  • running docking and pocket finding in-chat
  • property prediction and ADMET
  • visualizing chemical results
  • going from idea → molecule → simulation without code

If you’re a chemist, Balto is the assistant designed for your domain—the one that understands your molecules, your tools, and your workflows.

👉 Try Balto and see how domain-specific simulation-powered AI changes the pace of medicinal chemistry.

Audacious drug hunter?
Get notified when we publish.

We will use the information you provide to contact you about relevant content, products, and services. You may unsubscribe from these communications at any time. For more information, check out our Privacy Policy.

-

October 1, 2025

Out-of-the-Box GenAI for Medicinal Chemistry

AI in science is crossing a threshold. With the launch of tools like Claude for Life Sciences, the notion that assistants should do more than generate text is going mainstream— AI in research today needs to use scientific tools, interact with real data, and help move projects forward. Tool-use isn’t a novelty anymore; it’s quickly becoming the baseline.

But an important distinction is emerging: different scientific domains need different assistants.

Claude is becoming a strong partner for biology-forward workflows—protocol drafting, bioinformatics, regulatory documentation, and connecting to platforms like Benchling or PubMed.

Medicinal chemistry is a different problem space.
It doesn’t live in documents. It lives in structures, pockets, properties, conformers, and models that compute. That’s why general assistants (Claude, ChatGPT, Gemini) are still limited to reading and summarizing (or in the case of Claude for Life Science, passing text-based data between tools).

Balto is built for chemistry.
It’s the domain-specific, tool-using assistant for molecular design, docking, property prediction, and structural analysis—the “Claude for chemistry,” powered not by document connectors but by molecular simulations.

This post walks through what current generative AI can (and can’t) do for chemists, and where chemistry-aware assistants like Balto meaningfully change the workflow—from reading to extracting, modeling, interpreting, and deciding.

Generative AI for Reading and Knowledge Synthesis in Chemistry

General-purpose AI excels at summarizing. Models like Claude or ChatGPT are excellent at summarizing biological literature or protocols, and Claude’s new scientific connectors broaden that even further. But none of these tools can extract structures from PDFs and convert them to usable chemical objects—something that matters the moment your work depends on molecules rather than prose.

While general summarization is quite strong from these models, there are limits - particularly when it comes to queries where an accurate answer comes from gathering and manipulating the right set of underlying data. These models aren’t built for chemistry. They won’t flag the difference between a small mistake and a major flaw in a synthesis route. They also can’t pull chemical structures out of a PDF and turn them into something you can actually work with. So while they help with surface-level understanding, they stop short of giving you material you can use in the lab.

A query of "Find known PI3K inhibitors with activity better than 100 nM in pubChem" in ChatGPT 5

So how does a chemistry-specific genAI tool differ when it comes to reading and knowledge synthesis? 

Instead of just saying what a paper is about, the right tool can pull out molecules, convert them to SMILES, and connect them to modeling or docking. That means the jump from “reading” to “doing” is immediate. Additionally, parts of data curation and knowledge gathering that require "action" can occur directly within a single chat and interface.

Balto takes a simple request and directly queries the database in question, returning both a summary and resulting data files that can be manipulated with other chemistry-specific tools

Reading and Knowledge Synthesis: General vs. Chemistry-specific

AI Comparison Table
Task General-purpose AI (ChatGPT, Claude, Gemini) Chemistry-specific AI assistant
Summarize papers Yes — can turn dense text into plain summaries Yes — plus highlights chemistry-relevant details
Extract chemical structures from PDFs No Yes — can convert images/tables into usable molecules (e.g. SMILES)
Interrogate domain-specific knowledge Ability to interact with a wall of text and ask for clarification Domain specific visualizations and underlying data
Connect to downstream tools No — outputs stay as text Yes — molecules can be sent straight to modeling or docking
Practical use for chemists Good for quick overviews Bridges reading and doing actual work

Integrating Tools and Agentic Workflows in GenAI for Chemistry

Some AI tools can now connect to external apps or even run short workflows on their own. Coding copilots like GitHub Copilot or ChatGPT with a Python sandbox can fetch data, clean it up, and run analyses. More advanced “agentic” setups can chain multiple steps together without constant prompting and are often paired with tool use.

Claude’s new scientific connectors—Benchling, BioRender, PubMed, and others—push this further for biology and document-heavy scientific research. They make it easier to coordinate literature, protocols, and experimental records through a single interface.

But chemistry needs something different. Chemists don’t just move text between systems; they move structures, 3D pockets, ADMET properties, and docking inputs. Those require tools that understand molecules, not just metadata. That’s why Balto bundles chemistry-specific capabilities directly into chat, no scripting or custom setup required.

In theory, you could ask a general agent to pull data from PubChem, write Python to filter results, and generate plots. But here’s the catch: most of us aren’t software engineers. Catching subtle errors or debugging generated code takes time and expertise—and even skilled coders end up validating every step. Worse, these systems aren’t tuned for chemistry, so they won’t know what filters matter in a lead optimization project or how to flag a result that looks chemically off.

Claude Sonnet 4.5 highlighting potential pitfalls in how it would attempt to respond to "Find all FDA-approved kinase inhibitors with a molecular weight under 450 Da, then analyze their common substructures and predict which scaffold would be most promising for developing a new BTK inhibitor with improved brain penetration."

So while agentic workflows and coding copilots are a big step forward, they still keep the burden on the scientist. You’re the one who has to check outputs, fix scripts, and connect the dots.

The same prompt in Balto returns a comprehensive set of matching kinase inhibitors, supplies the underlying data, provides a human-readable trail of steps taken, and provides chemistry-aware scaffold recommendations. All from a single conversational prompt.

Integrating Tools with GenAI: General vs. Chemistry-Specific

AI Comparison Table
Task General AI copilots / agents Chemistry-specific AI assistant
Connect to external tools Yes — can call APIs or run scripts Yes — pre-connected to chemistry tools and optimized to execute together
Handle chemistry data Only as raw text or numbers Understands molecules, reactions, and structures
User input needed Often requires Python or scripting Simple instructions in plain language
Quality of outputs Depends on user’s coding and checks Tuned for chemical accuracy
Burden on the scientist High — you fix and validate Lower — assistant handles chemistry logic

Running Simulations

This is where general-purpose genAI stops. ChatGPT or Claude can explain what a molecular docking experiment is. They can even write example code for it. But no "out -of-the-box" model is set up to actually run the simulation. At best, you get instructions for tools you’ll have to set up and run yourself.

A chemistry-specific AI assistant changes that. Instead of telling you how to run a simulation, it runs it for you. Docking, pocket finding, ADMET prediction—these are built into the system. You ask in plain language, and the assistant produces usable results.

Claude Sonnet 4.5 writes a Python script one could potentially use after installing required software, manually downloading data, and changing settings

That’s the difference between an AI that only talks about science and one that can do science. For a medicinal chemist, it means going straight from idea to data in the same tab without writing code.

Running Molecular Simulations: General vs. Chemistry-Specific

AI Comparison Table
Task General-purpose AI Chemistry-specific AI assistant
Explain simulation concepts Yes — can describe methods like docking or MD Yes — plus tailored to chemistry use cases
Generate example code Yes — Python scripts or workflows Not needed — runs simulations directly
Run simulations No Yes — docking, pocket finding, ADMET predictions, etc.
Output format Text or code only Usable chemical results, ready for analysis
Scientist effort High — set up software, validate code Lower — focus on interpreting results

Interpreting Results

Analyzing outputs is just as important as running the job. General AI models can summarize results, but they don’t know enough chemistry to judge them correctly. They might overstate an outcome, misinterpret a graph, or simply make something up. This is the risk of hallucination — the model fills in blanks with guesses.

A chemistry-specific AI assistant is tuned differently. It doesn’t just summarize data; it can interpret it with the rules and context of the field. For example, it can highlight whether a docking score is within a realistic range, flag odd results, or point out if a conformer looks unstable. Instead of acting like a general storyteller, it behaves more like a trained colleague who knows what “normal” looks like in medicinal chemistry. Additionally, domain-specific visualizations are the bread and butter of informed decision making and collaboration in life science research. These domain-specific visualizations are yet to be included in common LLMs.

The result of two prompts to Balto (1) "Dock Adempas to crystal ligand site of 7D9R" and "yes", resulting in manipulate-able underlying data and industry-specific visualizations

Interpreting Scientific Results: General vs. Chemistry-Specific

AI Comparison Table
Task General-purpose AI Chemistry-specific AI assistant
Summarize outputs Yes — but may miss key details Yes — with focus on chemical context
Identify errors or odd results Limited — lacks domain checks Stronger — flags unrealistic or unstable results
Handle numerical outputs (e.g. docking scores) Often misinterprets or guesses Understands typical ranges and significance
Risk of hallucination High — fills gaps with guesses Lower — grounded in chemistry and biology knowledge
Usefulness for chemists Surface-level insight Reliable interpretation that supports decisions

Benchmarks: Scientific Comprehension to Help You Do Work

A chemistry-aware assistant isn’t just about running simulations—it’s about understanding scientific material well enough to guide decisions. To measure this, we evaluated Balto on LitQA2 (Future House), a benchmark designed to test deep reading comprehension of real scientific papers.

Balto’s LitQA2 score: 86% pass rate (172/200)

  • PaperQA2 (Future House): 66.0% ± 1.2
  • Human experts: 67.7% ± 11.9
  • Typical LLM performance without tool use: ~25–30%

This means Balto reads and reasons about chemical literature at a level that exceeds reported human expert performance and surpasses existing published benchmarks—not because it is a general LLM, but because it is tuned specifically for chemistry and built to work with chemical structures and data.

Percentage accuracy correct by model or human on the LitQA2 benchmark. Note that Humans had unlimited time and tool access.

This level of comprehension is what makes the next step possible: an assistant that not only understands your work, but can help do the work.

Acting Like a Lab Assistant

Most AI tools today stop at talking. They read, they summarize, they even suggest code. But they don’t plan an experiment with you. And they don’t carry out the steps.

A chemistry-specific AI assistant shifts from being a “chatbot” to being a lab assistant. It can plan workflows: suggest which molecules to test, flag where to focus next, and lay out the steps in order. And it doesn’t stop there — it can also do the work: extract molecules, run docking, generate conformers, and return results you can use.

This combination of planning and doing is what makes the difference. Instead of a tool that just answers questions, you have one that helps push projects forward. Then the question becomes how well that assistant provides work that is accurate, fast, and understandable. We think this is where Balto really shines.

👉 With Balto, you can:

  • Pull a compound from popular databases and run it through a docking workflow in one chat.
  • Ask for binding pockets on a target and get structured results back.
  • Move from idea to data without switching tools or writing code.

Additionally, the combination of access to underlying chemistry tools and

Acting Like a Lab Assistant: General vs. Chemistry-Specific

AI Comparison Table
Task General-purpose AI Chemistry-specific AI assistant (Balto)
Plan workflows Limited — vague suggestions Concrete steps, tailored to chemistry
Do the work No — requires external setup Yes — runs docking, pocket finding, ADMET prediction, and more
Combine planning + doing Not possible Integrated in one place
Output format Explanations only Usable data and next-step suggestions
Role for chemists Still carrying the load Supported by an AI lab partner

Different assistants for different domains

The launch of Claude for Life Sciences has recently garnered a great deal of attention: it underlines the new-ish expecatation that genAI is moving beyond chat into action, with connectors and skills that make it genuinely useful for biologists, clinical teams, and regulatory scientists.

But it also reinforces a crucial reality:

Life sciences is not one domain.

Cheminformatics and molecular modeling have different needs than genomics or regulatory writing. No single assistant will do everything well.

Claude is built for biology-heavy workflows:

  • literature reviews
  • protocol and SOP generation
  • bioinformatics and data analysis via connectors
  • clinical and regulatory documentation
  • access to Benchling, BioRender, PubMed, Wiley, and 10x tools

Balto is built for chemistry-heavy workflows:

  • pull molecular data from common databases
  • generating SMILES or 3D structures instantly
  • running docking and pocket finding in-chat
  • property prediction and ADMET
  • visualizing chemical results
  • going from idea → molecule → simulation without code

If you’re a chemist, Balto is the assistant designed for your domain—the one that understands your molecules, your tools, and your workflows.

👉 Try Balto and see how domain-specific simulation-powered AI changes the pace of medicinal chemistry.

Recent articles

We value your privacy

We use statistics cookies to help us improve your experience of our website. By using our website, you consent to our use of cookies. To learn more, read our Privacy Policy and Cookie Policy.