How to index large number of images?



I am extracting images from graphic heavy magazines in high volume. As this operation is I/O heavy, I have created a micro service(multiple AWS EC2 instances) to process PDFs.

Both applications shares session and have shared directory(Amazon EFS).

The microservice has exposed a RESTful API to receive document(PDF) id and figureout the location of PDF based on common configuration. So there is no need for DB so far.

Now, I need to display all extracted images in another frontend rails app. I also have to track which image belongs to which page of the specific PDF.

So is it a good idea to share DB between two rails application where I need to interact with only one model(Image)? Any other suggestion for the architectural design?


Could you have the API handle it? Then you wouldn’t need two connections to the DB.


In order to reduce load on Front End application, I have devised another application(say microservice) to handle IO heavy operations initiated from the Front End UI directly(used shared session to authenticate each request).

So I cannot expose API from Front End application to insert images from microservice app.