lomas.generator submodule

class lomas.generator.GeneratorCommon(ip_id_dict, cdf_size)

class lomas.generator.GeneratorLomas(ip_id_dict, ordered_ippair, cdf_iat, cdf_size)

__init__(ip_id_dict, ordered_ippair, cdf_iat, cdf_size)

基于历史流量数据进行模型训练、基于训练好的模型产生新的合成流量数据

参数:

ip_id_dict (dict) -- key=index of IP, value=(anonymized)IP addr
ordered_ippair (list) -- ordered IP pair (IP is represented by its index)
cdf_iat (dict) -- key=percentile, value=values of interarrival time CDF at some percentile
cdf_size (dict) -- key=percentile, value=values of flow size CDF at some percentile

generate(time_limit, time_unit)

生成新的合成流量数据

参数:

time_limit (int) -- control how many flows will be generated (s.t. [num. of flow]*[avg. iat] <= [time_limit])
time_unit (int) -- time uint of time_limit

返回:

self.trace_syn (synthetic trace)

返回类型:

pandas.DataFrame

initialize(trace_input)

获取每个源目的对之间的流数据，并以二维数组的数据类型储存

参数:: trace_input (pandas.DataFrame) -- can be accessed using lomas.Preprocessor.trace_input
返回:: self.arr_flow_type, self.dictionary, self.corpus
返回类型:: 2D list, 1D list, 2D list

sampling_helper(cdf, tag)

辅助函数，将离散化的流大小、流间隔标签映射回实数值

参数:

返回:

continuous value of size or iat

返回类型:

float

sampling_value(doc_idx)

从隐空间概率分布矩阵中采样，以概率分布产生流大小和流间隔的联合取值

train(num_topics=25, chunksize=2000, passes=20, iterations=400, eval_every=None)

模型训练

参数:

num_topics (int) -- dimension of latent space
chunksize (int) -- num of documents will be processed at a time
passes (int) -- num of epochs
iterations (int) -- how often we repeat a particular loop over each document
eval_every (int) -- don't evaluate model perplexity, takes too much time

返回:

self.doc_topics (document-topic distribution), self.topic_terms (topic-word distribution)

返回类型:

2D np.array, 2D np.array