Published 29 Oct, 2022

Java - Which data structure to use to store millions of objects in multithreading environment ( Scalability and performace )?

Category Java
Modified : Nov 27, 2022
60

My requirement is like : I want to perform frequent operation on millions of objects in multi-threaded environment with concurrency and scalability keeping in mind, I need best data structure suitable for this requirement.

For example :

public interface CarDetails {
   public CopyOnWriteArrayList<Car> getAllCars();
   public Car getMostSoldCars(int carModel);
   public void addNewCarDetails(Car car);
   public void oldCardDetails(Car car);
}     

Initially i had thought to use concurrent API's(CopyOnWriteArrayList) as its performs better compared to externally synchronizing the List( eg: Collections.synchronizedList(list object)).

Issue with CopyOnWriteArrayList : To store millions of objects in memory and performing frequest operations on it has performance impact because CopyOnWriteArrayList creates entirely new List whenever any updation occurs on it and performing such opertions on millions of objects has performance issue. It is good for multiple readers but i am looking for performance on large number of objects.

Issue with Collections.synchronizedList(list object) : Externally synchronizing the list has another issue because it locks on entire object which has another performance issue.

Could anyone suggest me , Which collection API's is suitable for this type of requirement( Concurrency , Scalability , Millions of objects , better performance on frequent operation).

Thanks in advance !!!

Answers

There are 2 suggested solutions here and each one has been listed below with a detailed description. The following topics have been covered briefly such as Multithreading, Java, Concurrency, Scalability. These have been categorized in sections for a clear and precise explanation.

25

ConcurrentLinkedQueue is wait-free (i.e. lock-free and threads won't starve) and doesn't perform any copying

If you want to maintain a set instead of a list, then you can have multiple threads add objects to the ConcurrentLinkedQueue and have a single thread poll the queue and add the objects to an unsynchronized HashMap; this may be more efficient than using a ConcurrentHashMap. However this assumes that you can withstand a slight delay between an object being added and the object showing up in the set.


38

I think the best data structure for performance would be a hashMap, it has an O(1) search operation, while an arrayList takes an O(N).

On the concurrency side, I would probably go with the

ConcurrentSkipListMap

Or

ConcurrentHashMap

Depending on your needs.

I go into more details on the difference between the two here: Thread safe way to copy a map