IndiaFOSS 2025 | Devroom – FOSS in Science#

Srihari Thyagarajan, Agriya Khetarpal

This blog post is our account of SciPy India’s participation in the FOSS in Science devroom at IndiaFOSS 2025, held in Bengaluru. Here, we describe the devroom as it happened, including the talks, discussions, and the incredible diversity of applications of free and open-source software in scientific research and practice across India.

Details#

Video#

Background#

IndiaFOSS 2025 brought together the Indian FOSS community for two days of talks, workshops, and networking. Among the various devrooms, the FOSS in Science devroom carved out a space for researchers, practitioners, and enthusiasts to discuss how open-source software is transforming scientific work — topics included data acquisition in STEM education, formalizing mathematical proofs, tracking bird populations, and building digital public infrastructure for landscape planning.

Agenda#

We had an ambitious lineup spanning the entire day. Here’s what the devroom covered:

Highlights#

Jithin B P’s presentation on KuttyPy kicked off the devroom with a practical demonstration of how a modified bootloader can transform ubiquitous Arduino boards into real-time data acquisition systems without the traditional compile-upload cycle. The approach is elegant in its simplicity: by enabling direct register-level access through a Python library, students and educators can interact with sensors and actuators in real-time, making embedded systems education far more accessible and immediate. Jithin’s work has already found its way into outreach programs teaching kids embedded systems, with a curriculum that builds from binary number systems to programming basics. The upcoming STEM turtle board, with onboard motor drivers and sensors that can interact with other turtles, promises to bring Logo-like programming into the physical world in a delightfully tangible way.

Aditi Juneja’s talk on API dispatching in the Scientific Python ecosystem addressed a fundamental challenge: how do libraries like NetworkX, NumPy and scikit-image enable users to transparently switch between different computational backends for improved performance? Her presentation walked through the evolution from if-else conditions to decorator-based dispatching, showcasing how NetworkX supports both type-based and name-based dispatching with fallback mechanisms, while NumPy relies on type-based dispatching through __array_function__, and scikit-image uses entry points for greater flexibility. The talk clarified what’s often hidden plumbing in scientific software, making it clear how these design choices affect both library developers and end users.

Pradyot Ranjan’s presentation on the Array API Standard made a convincing argument for why the widespread use of array libraries — NumPy, CuPy, PyTorch, JAX, and others — requires a standardised API specification. The standard aims to provide consistent APIs and array operations across different Python libraries, enabling backend-agnostic code that can run on CPUs, GPUs, or specialized hardware without modification. This is the kind of infrastructure work that doesn’t make headlines but makes everything else possible: when consumer libraries like scikit-learn can write against the Array API instead of assuming NumPy, the entire ecosystem becomes more interoperable and sustainable.

Sanket Verma, a long-time maintainer of Zarr and a Technical and Administrative Board member of the NumFOCUS, presented what’s new with Zarr, the cloud-optimized storage format for N-dimensional arrays. Created initially by Alistair Miles in 2016 for handling massive genomic datasets from malaria research, Zarr’s approach of dividing large arrays into compressed chunks that can be stored and loaded selectively has proven transformative for scientific computing at scale. The recently released Zarr Python 3 brings a leaner specification, improved handling of high-latency storage, and extensibility through entry point mechanisms for custom stores and codecs. With institutions using Zarr to store datasets approaching petabytes in the cloud, Sanket’s update on the active community — weekly meetings, office hours, and a welcoming environment for new contributors — reinforced that this is infrastructure built for the long haul.

Pradeep Koulgi’s talk on assessing the State of India’s Birds using FOSS demonstrated the power of citizen science data at scale. This initiative involves a partnership of 14 organizations that rely on millions of observations contributed by birdwatchers to the eBird platform. Together, they create trend analyses and distribution maps for over 940 bird species, all of which are published publicly and free of charge. Their MYNA web tool enables users to generate localized regional reports, transforming this vast dataset into actionable insights for conservation efforts. What stood out most was Pradeep’s openness about the challenges they face. As the data corpus grows exponentially, the process of computing and analyzing it becomes increasingly difficult without expertise in FOSS tools and scalable infrastructure. Their journey—moving from struggles with computational bottlenecks to developing scalable, cost-effective solutions with FOSS—is a narrative that many scientific projects can relate to.

Arjun Verma’s presentation on collaborative CAD & GIS in JupyterLab showcased Jupyter CAD and Jupyter GIS as browser-native applications that bring parametric 3D modeling and full-featured GIS capabilities directly into the notebook environment. Built on Open Cascade and OpenLayers compiled to WebAssembly, these tools support everything from boolean operations and fillets in CAD to advanced symbology and time sliders in GIS — all while maintaining collaborative editing and Python API integration. The vision here is clear: reproducible, shareable workflows where code, visualization, and analysis live together, whether you’re analyzing climate patterns, planning land use, or teaching spatial concepts.

After a much-needed break — Dr. Aaditeshwar Seth presented the CoRE stack, a digital public infrastructure initiative aimed at helping communities make informed decisions about their landscapes. Rather than starting with technology and looking for problems, the CoRE stack begins by understanding what communities are trying to solve and using machine learning, satellite data, and hydrological modeling to make marginalized groups more visible and empowered. The stack’s layered approach — data collection, modeling, analytics — generates indicators for landscape planning using open data from satellites and government sources. Tools like “Know Your Landscape” for area-level planning and “Commons Connect” for community volunteers to geotag interventions demonstrate a community-based approach to landscape stewardship. Their goal of supporting 25,000 landscape stewards across India, potentially funded through innovative models like carbon credits, shows ambitious thinking about scale and sustainability.

Sagnik Saha’s talk on formalizing mathematics and scientific computing with Lean, an open-source theorem prover, introduced a world where mathematical proofs are not just written but verified by software. Formalization ensures reproducibility and streamlines peer review by making mathematical logic programmable and checkable. Lean’s dependent type system and interactive theorem proving have already been used to verify major projects like the liquid tensor experiment and formalize entire textbooks on analysis. The community is active and growing globally, but as Sagnik noted, India has fewer than ten contributors currently — a gap that represents both a challenge and an opportunity for the community.

Kriyanshi Shah’s presentation on empowering open science with scalable interactive computing environments tackled a problem familiar to anyone who’s tried to onboard new researchers to scientific computing: installation is hard, especially in air-gapped systems without internet access. Her solution, built on JupyterHub and Kubernetes, provides scalable, secure, domain-customizable computing environments with Docker containers preloaded with scientific libraries. Each user gets a ready-to-go environment with minimal technical overhead, mounted with shared satellite data volumes. What started as infrastructure for researchers is now expanding to enable data sharing and interaction with custom satellite datasets across more scientific domains.

Agriya Khetarpal’s presentation on interactive in-browser FOSS tools for science communication and evidence-based environmental policy, argued that science communication often fails not because scientists aren’t trying, but because of structural barriers — pedagogical, sociopolitical, and monetary. Using the example of air pollution in New Delhi, where stubble burning has decreased 71% over five years, yet pollution remains severe due to vehicular emissions and industry, Agriya demonstrated how Pyodide and JupyterLite enable data analysis and visualization entirely in the browser. These tools lower barriers to entry, allowing journalists, policymakers, and citizens to engage with data interactively without requiring high-performance computing infrastructure. When science communication is reproducible, interactive, and accessible, we move closer to evidence-based policy that serves everyone, especially vulnerable communities disproportionately affected by misdirected resources.

Aftab’s talk on instrumenting science with DevOps tools confronted an uncomfortable truth: over 50% of published scientific findings cannot be reproduced. The reproducibility crisis isn’t just embarrassing; it’s expensive and undermines trust in science. His solution — research ops, or the application of DevOps practices to scientific research — uses tools like Git for versioning manuscripts and datasets, Docker for packaging environments, and Kubernetes for orchestrating computational workflows. CI/CD pipelines automate repetitive tasks, improving efficiency and reducing human error. While implementing these practices comes with cultural and technical challenges (scientists learning new tools, integrating with legacy systems), the potential to improve collaboration, reproducibility, and the reliability of scientific findings makes it worthwhile.

Alosh Denny’s presentation on making LLMs transparent for science addressed the black-box problem of large language models: we see the output but not the reasoning. Using open-source tools like TransformerLens and Prisma 2, Alosh demonstrated how to peek into LLMs’ internal workings — tracing the exact path of how they process questions, visualizing attention heat maps showing word relevance, and even examining neurons in multimodal models to understand how they identify images. For scientific applications, where understanding how a model arrives at conclusions is as important as the conclusions themselves, these interpretability tools are essential for building trust and identifying failure modes.

Jigyasu Krishnan closed out the technical program with his talk on benchmarking time series models with sktime. sktime provides a unified, community-governed framework with consistent APIs across all models, the largest model zoo in the ecosystem, and dataset loaders for popular benchmarks. The upcoming “collections” feature — sets of predefined models, datasets, metrics, and cross-validation strategies — will make reproducing and extending studies even easier. It’s the kind of infrastructure that makes science more efficient and reproducible by reducing friction in the experimental process.

Closing remarks#

After a full day of talks spanning hardware hacking, infrastructure design, data analysis, policy, and everything in between, the devroom concluded with announcements about the SciPy India community. We shared information about our first community call in July, our upcoming plans for more community calls and workshops, and our hopes for an in-person conference later this year or early next year. We encouraged attendees to join our Zulip chat, check out our open-source repositories, and fill out the feedback form with their thoughts on the community and preferences for future events.

Livestream#

The full FOSS in Science devroom was livestreamed on YouTube for those who couldn’t attend in person. The recording remains available for viewing at your convenience, with timestamps for individual talks.

Videos#

Individual talk recordings are now available in the FOSS in Science devroom playlist. Here’s the complete schedule with links to each presentation:

In a nutshell#

Thank you to everyone who spoke, attended, asked questions, and contributed to making the FOSS in Science devroom at IndiaFOSS 2025 a success. To the speakers who travelled to Bengaluru and prepared thoughtful presentations showcasing their work: your contributions made the day what it was. To FOSS United for organizing IndiaFOSS and providing us with the platform and infrastructure to host the devroom: we’re grateful for your continued support of the FOSS community in India.

We look forward to seeing you at future community events, whether online or in person! To stay updated, join our Zulip chat and follow us on our social channels. And to you, the reader, thank you for taking the time to read about our day in Bengaluru. We hope it inspires you to explore, contribute, and connect with the scientific FOSS community.