February 18, 2022

The use of open-source software for data spaces

In the work of IDSA we made a major shift during the recent years towards Open Source Software. Co-creation was always one of the most important drivers.

Sebastian Steinbuss

But during this week I stumbled upon this [1] report on the state of open source in 2022. Well, this has definitely a broader perspective than the IDSA work, but it also underlines the discussion we had on the IDSA Winter Days and the presentation of Sebastian Opriel from sovity.

I can recommend the report and I have found some insights that I would like to share:

General context and scope of the report

The report represents the North American market, and a lot of feedback came from the tech sector. It includes a good mix with regard to company size and also feedbacks from different perspectives from developers to CTOs.

It does not surprise that in general companies tend to use open-source Technologies and that OSS is integrated into the whole technology stack. Of course, some key functionalities are based on open-source Software, e.g. programming languages and container technology.

The top concerns and top reasons to use open-source software

Let’s start with the concerns. A good one is a fact that under the top concerns is ‘I don’t have any reservations’. On the other side, a lack of internal skills and reservation towards licenses are important and relevant concerns. The missing real-time support is another major concern. From my perspective there is nothing listed, that cannot be solved.

When we look at the reasons to make companies use open-source Software, that was a slight surprise for me, and let me start with the positive surprise: It is the access to innovations and latest technologies. I really had to think about this, but obviously, OSS is an innovation driver in companies. If that’s the case, I am really glad that our members build OSS components that can bring Data Innovation directly into the companies. So, if you are looking for such innovations based on open source, you might want to investigate the IDSA ecosystem, you might like to visit the Eclipse Data Space Connector EDC or follow the IDSA OSS Graduation Scheme? But back to the study: cost reduction due to missing license cost is another major driver. This could be considered from another perspective, too. Not only in conjunction with the latest Log4J exploit, everyone discussed the fact that OSS contributors should be paid for their work. Finally, everything else on the list is about modernization, functional improvements, alternatives, and avoiding vendor lock-in.

If I look at the top concerns and top benefits from this report, I would say, there is still room for improvement in understanding OSS and supporting OSS, not only using things for free. This is at least partly reflected in the section on the sponsorship.

And data spaces?

Let’s build Data Spaces on the shoulders of giants and use open-source software. The section on Data (and of course, I directly jumped there from the table of content) had a sobering effect on me. At least the list includes NoSQL databases and data processing tools like Apache Kafka and Spark. That is neither surprising nor of real interest from a data space perspective, but if you cross-read the whole report, you will find core technologies that you will consider to build data spaces and to start using data for your business.

Conclusions

This report is really interesting, but not surprising. My key takeaway is, if we want to innovate the way companies work with data, then we have to provide open-source software. I think, from the IDSA perspective and also looking at the DSBA with BDVA, Gaia-X (including gxfs.de), and FIWARE, we are on track.

[1] https://www.openlogic.com/resources/2022-open-source-report

Author: Sebastian Steinbuss