An Efficient Similarity Digests Database Lookup - A Logarithmic Divide & Conquer Approach

Frank Breitinger, Christian Rathgeb, Harald Baier

Abstract


Investigating seized devices within digital forensics represents a challenging task due to the increasing amount of data. Common procedures utilize automated file identification, which reduces the amount of data an investigator has to examine manually. In the past years the research field of approximate matching arises to detect similar data. However, if n denotes the number of similarity digests in a database, then the lookup for a single similarity digest is of complexity of O(n). This paper presents a concept to extend existing approximate matching algorithms, which reduces the lookup complexity from O(n) to O(log(n)). Our proposed approach is based on the well-known divide and conquer paradigm and builds a Bloom filter-based tree data structure in order to enable an efficient lookup of similarity digests. Further, it is demonstrated that the presented technique is highly scalable operating a trade-off between storage requirements and computational efficiency. We perform a theoretical assessment based on recently published results and reasonable magnitudes of input data, and show that the complexity reduction achieved by the proposed technique yields a 220-fold acceleration of look-up costs.

Keywords


digital forensics; hashing; approximate matching; Bloom filter; mrsh-v2; sdhash; indexing

Full Text:

PDF

References


Bloom, B. H. (1970). Space/time trade-offs in hash coding with allowable errors. Communications of the ACM, 13 , 422--426.

Breitinger, F., & Baier, H. (2013). Similarity preserving hashing: Eligible properties and a new algorithm mrsh-v2. In M. Rogers & K. Seigfried-Spellar (Eds.), Digital forensics and cyber crime (Vol. 114, pp. 167--182). Springer Berlin Heidelberg. Retrieved from http://dx.doi.org/10.1007/ 978-3-642-39891-9 11 doi: 10.1007/978-3-642-39891-9 11

Breitinger, F., Baier, H., & White, D. (2014). On the database lookup problem of approximate matching. 1st Digital Forensics Research Conference EU (DFRWS-EU'14).

Breitinger, F., Liu, H., Winter, C., Baier, H., Rybalchenko, A., & Steinebach, M. (2013, Sept). Towards a process model for hash functions in digital forensics. 5th International Conference on Digital Forensics & Cyber Crime.

Breitinger, F., Stivaktakis, G., & Baier, H. (2013, August). FRASH: A framework to test algorithms of similarity hashing. In 13th Digital Forensics Research Conference (DFRWS'13). Monterey.

Breitinger, F., Stivaktakis, G., & Roussev, V. (2013, Sept). Evaluating detection error trade-offs for bytewise approximate matching algorithms. 5th ICST Conference on Digital Forensics & Cyber

Crime (ICDF2C).

Broder, A., & Mitzenmacher, M. (2005). Network Applications of Bloom Filters: A Survey. Internet Mathematics, 1 (4), 485--509.

Gallagher, P., & Director, A. (1995). Secure Hash Standard (SHS) (Tech. Rep.). National Institute of Standards and Technologies, Federal Information Processing Standards Publication 180-1.

Kornblum, J. (2006, September). Identifying almost identical files using context triggered piecewise hashing. Digital Investigation, 3 , 91--97. Retrieved from http://dx.doi.org/10.1016/j.diin.2006.06.015 doi:

1016/j.diin.2006.06.015

Mullin, J. (1990, may). Optimal semijoins for distributed database systems. IEEE Transactions on Software Engineering, 16 (5), 558 -560. doi: 10.1109/32.52778

NIST Information Technology Laboratory. (2013). National Software Reference Library. http://www.nsrl.nist.gov)

Noll, L. C. (1994--2012). Fnv hash. http://www.isthe.com/chongo/tech/comp/fnv/index.html.

Roussev, V. (2010). Data fingerprinting with similarity digests. In K.-P. Chow & S. Shenoi (Eds.), Advances in digital forensics vi (Vol. 337, pp. 207--226). Springer Berlin Heidelberg. Retrieved from http://dx.doi.org/10.1007/ 978-3-642-15506-2 15 doi: 10.1007/978-3-642-15506-2n 15

Roussev, V. (2011, August). An evaluation of forensic similarity hashes. Digital Investigation, 8 , 34--41. Retrieved from http://dx.doi.org/10.1016/j.diin.2011.05.005 doi:

1016/j.diin.2011.05.005

Roussev, V. (2012). Managing terabyte-scale investigations with similarity digests. In

G. Peterson & S. Shenoi (Eds.), Advances in digital forensics viii (Vol. 383, pp. 19--34). Springer Berlin Heidelberg. Retrieved from http://dx.doi.org/10.1007/978-3-642-33962-2 2 doi: 10.1007/978-3-642-33962-2 2

Roussev, V., III, G. G. R., & Marziale, L. 2007, September). Multi-resolution similarity hashing. Digital Investigation, 4, 105--113. doi: 10.1016/j.diin.2007.06.011

Winter, C., Schneider, M., & Yannikos, Y. (2013). F2s2: Fast forensic similarity search through indexing piecewise hashsignatures. Digital Investigation, 0 , -. (Article in Press - no journal issue assigned by now) doi: http://dx.doi.org/ 10.1016/j.diin.2013.08.003


Refbacks

  • There are currently no refbacks.


Copyright (c) 2014 Journal of Digital Forensics, Security and Law



Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

(c) 2006-2015 Association of Digital Forensics, Security and Law