Today, I once again encountered a common development issue. In our Big Fat Java Codebase™️, there’s a crucial HTTP endpoint that accepts a large and often changing model as a payload. Updating this model to stay in sync with our Python codebase has always been a tedious task.
So far, no one has taken on the herculean task of rewriting this entire model in Python. Instead, we’ve been selectively porting over smaller subparts as needed. While this ostrich strategy has served us well for a while, it became untenable when I found myself facing the task of rewriting most of the model into Python. That’s when I decided to automate the process.
How to do it?
Unfortunately, a quick Google search revealed that I might be the pioneer in figuring out how to achieve this. This inspired me to write this post - to potentially save you some time. With 7 billion people on Earth, I’m certain that someone else will face this problem in the future, and there are certainly better ways to spend your time than trying to solve it from scratch.
Given that Pydantic is a solid, well-considered library, and that Java’s support for reflection is robust, I chose to use these tools to my advantage. The idea is to employ reflection to extract the model metadata from Java, and then compile it into a string that can be saved as a single Python file. This file can then be imported and used with minor adjustments.
In order to achieve this, I implemented support for the following:
- Java primitive types
- Set, List, Map collections
- Nested classes
- Enums (compiled to literals)
- Subclasses of interfaces and abstract classes (I compile them to Union)
- Recursive types (I use
__future__.annotations
to support forward references) - Using names from
SerializedName
annotation instead of field names - Optional annotations for fields that can be null
MetadataModel
is used because Pydantic versions prior to 2.0 have a well-known issue with parsing Union
types.
If you don’t use the smart_union
feature, it tries to parse classes in the order they’re defined. This means if you have a class with a field of type Union[ClassA, ClassB]
and ClassA
is defined after ClassB
, and ClassB
is more specific, Pydantic will forcefully parse anything as ClassA
. This problem was finally resolved in Pydantic 2.0.
Final code
These code snippets still need to be incorporated into your own codebase, but at least the hard part is already done 🎉