Published 08 Oct, 2022

Java - How to avoid join 'explosions' in JPA with Hibernate when fields from multiple tables are needed?

Category Java
Modified : Nov 30, 2022
42

Let's say I have the following method in a JpaRepository:

@EntityGraph(value = "Subject.allJoins", type = EntityGraphType.FETCH)
    @Query("select s from Subject s" + FIND_QUERY_WHERE)
    Page<Subject> findInProject(@Param("projectId") UUID projectId, <additional params>

As you can see I already use an EntityGraph with the joins I need. The resulting SQL query that Hibernate generates is as follows (with the where mostly omitted)

select
    subject0_.id,
    <all kinds of fields including some duplicates>
from
    subject subject0_
    left outer join project project1_ on subject0_.project_id = project1_.id
    left outer join subject_property_value properties2_ on subject0_.id = properties2_.subject_id
    left outer join property_value propertyva3_ on properties2_.property_value_id = propertyva3_.id
    left outer join ingestion_id_mapping ingestedme4_ on subject0_.id = ingestedme4_.subject_id
where
    subject0_.project_id = '123'
order by
    subject0_.name asc

Because of all the joins here that multiply the result for the number of rows as the result of a join, the result set explodes into hundreds of thousands of rows even if the total amount of subjects is just a few hundred.

Note that I will be making a projection and that will already avoid selecting some of the fields, but the joins are still needed.

What can I do to optimize this?

Note I do actually need all the data to immediately serialize to a client, so just leaving it to Hibernate by getting the model entity and using Getter methods for each association takes a lot longer than even this.

My current idea is that I have to do the query multiple times with the same where for each separate join and then later merge the results into a single object. It's not the end of the world if I read more or fewer rows on a subsequent query because of added or deleted rows in the original table because I can just take the smallest subset of subject ids and make a result from that.

But is there something smarter and/or simpler to do than that?

Answers

There are 3 suggested solutions here and each one has been listed below with a detailed description. The following topics have been covered briefly such as Java, Spring, Hibernate, Jpa, Jpql. These have been categorized in sections for a clear and precise explanation.

2

I'll take the example of a football Club that has a Country, a Stadium and a list of Players.

Your first query should only be used to filter the rows that you want from the database. At this occasion, you can also fetch the 1:1 relationships, but not the 1:n. So, in my example, the 1st query should :

  • filter all the clubs matching the criteria
  • fetch all 1:1 relationship (country & stadium for each club).

Then, you can make a dedicated for each sublist. Still in my example, you would select every players whose club is in a list you provide as parameter of the query (being the result of your first query). The query would be something like :

 String jpql = "select p from Player p where p.club in :clubs";

Doing this, you can also provide entityGraph to load attributes of the players. That works well when you proceeed with pagination (the result of the 1st query will not be to important).

This way to do is well described by Vlad Mihalcea : The best way to fix the Hibernate MultipleBagFetchException

I deeply advise you to take a look.


34

This is a perfect use case for Blaze-Persistence Entity Views.

I created the library to allow easy mapping between JPA models and custom interface or abstract class defined models, something like Spring Data Projections on steroids. The idea is that you define your target structure(domain model) the way you like and map attributes(getters) via JPQL expressions to the entity model. Since the attribute name is used as default mapping, you mostly don't need explicit mappings as 80% of the use cases is to have DTOs that are a subset of the entity model.

The interesting part is, that you can specify a fetch strategy that should be used. A sample model could look like the following:

@EntityView(Subject.class)
public interface SubjectView {
    @IdMapping
    Integer getId();
    ProjectView getProject();
    @Mapping(fetch = SUBSELECT)
    Set<PropertyValueView> getProperties();
    Set<IngestionMappingView> getMappings();
}
@EntityView(Project.class)
public interface ProjectView {
    @IdMapping
    Integer getId();
    String getName();
}
@EntityView(PropertyValue.class)
public interface PropertyValueView {
    @IdMapping
    Integer getId();
    String getName();
}
@EntityView(IngestionMapping.class)
public interface IngestionMappingView {
    @IdMapping
    Integer getId();
    String getName();
}

Querying is a matter of applying the entity view to a query, the simplest being just a query by id.

SubjectView p = entityViewManager.find(entityManager, SubjectView.class, id);

The Spring Data integration allows you to use it almost like Spring Data Projections: https://persistence.blazebit.com/documentation/entity-view/manual/en_US/index.html#spring-data-features i.e. have a repository similar to the following

@Repository
public interface SubjectRepository {
    Page<SubjectView> findByProjectId(@Param("projectId") UUID projectId, <additional params>);
}

You can read more about the supported fetch strategies in the entity-view documentation, and I would generally recommend you use the MULTISET fetching strategy if possible as that usually provides the best performance.


10

The problem is, that the fetch-join does a sub-select for each related entity/table. Instead you should only join entities with a 1:1 relation. The other entities are then fetched when they are accessed the first time. This results in one row per Subject and one select with n rows for every entity not in the initial select.

If the sub-selects take too long, try adding the entities with the least number of records to the select.